muttest

CRAN status R-CMD-check Codecov test coverage cucumber muttest Grand total Last month

Coverage tells you which lines ran. It says nothing about whether your tests would catch a bug. You can delete every assertion, run covr, and still see 100%.

{muttest} measures the quality of your tests — not just how much code they execute.

The problem with coverage alone

covr tells you which lines were executed. It cannot tell you whether your assertions are strong enough to catch a real bug. A test suite full of expect_true(is.numeric(x)) checks will reach 100% coverage while missing every meaningful failure.

Mutation testing addresses this gap by asking a harder question: if this code were subtly wrong, would your tests notice?

The need for mutation testing in the age of LLMs

Many teams now use LLMs to write their tests. LLMs are good at producing syntactically correct, passing tests quickly — but they might cover only the obvious cases and miss the boundaries:

# What an LLM may write for is_adult():
test_that("is_adult works", {
  expect_true(is.numeric(is_adult(25))) # checks return type, not logic
  expect_true(is_adult(25))             # clearly an adult
  expect_false(is_adult(10))            # clearly a minor
})

# What actually catches the >= vs > boundary bug:
test_that("is_adult handles the boundary age", {
  expect_true(is_adult(18))    # kills the >= → > mutant
})

Both test suites pass. Both have 100% coverage. Only one would catch a developer accidentally writing age > 18 instead of age >= 18.

Mutation testing gives you a score that reflects assertion quality, not just execution. It gives you a concrete way to understand the real strength — and the real gaps — in an LLM-generated test suite.

How it works

This reveals whether your tests are asserting the right things:

{muttest} not only gives you the score, but it also tells you which files need stronger assertions.

Example

Given our codebase is:

#' R/is_adult.R
is_adult <- function(age) {
  age >= 18
}

And our tests are:

#' tests/testthat/test-is_adult.R
test_that("is_adult returns TRUE for adults", {
  expect_true(is_adult(25))
})

test_that("is_adult returns FALSE for minors", {
  expect_false(is_adult(10))
})

When running muttest::muttest() we’ll get a report of the mutation score:

withr::with_dir(system.file("examples", "boundary", package = "muttest"), {
  plan <- muttest::muttest_plan(
    mutators = muttest::comparison_operators()
  )
  muttest::muttest(plan)
})
#> ℹ Mutation Testing
#>   |   K |   S |   E |   T |   % | Mutator  | File 
#> ✔ |   1 |   0 |   0 |   1 | 100 | >= → <=  | is_adult.R 
#> x |   1 |   1 |   0 |   2 |  50 | >= → >   | is_adult.R 
#> 
#> Duration: 1.99 s
#> 
#> ── Survived Mutants ────────────────────────────────────────────────────────────
#> is_adult.R  >= → >
#>   2-   age >= 18
#>   2+   age > 18
#> 
#> ── Results ─────────────────────────────────────────────────────────────────────
#> [ KILLED 1 | SURVIVED 1 | ERRORS 0 | TOTAL 2 | SCORE 50.0% ]

The mutation score is: \(\text{Mutation Score} = \frac{\text{Killed Mutants}}{\text{Total Mutants}} \times 100\%\), where a Mutant is defined as variant of the original code that is used to test the robustness of the test suite.

comparison_operators() generates mutants by swapping each comparison operator for related alternatives. For >= it produces two mutants:

#' R/is_adult.R — mutant 1: ">=" -> ">"
is_adult <- function(age) {
  age > 18
}
#' R/is_adult.R — mutant 2: ">=" -> "<="
is_adult <- function(age) {
  age <= 18
}

Tests are run against both mutants.

Mutant 2 (>=<=) is killed: is_adult(25) now returns FALSE, which fails the first test.

Mutant 1 (>=>) survives: is_adult(25) still returns TRUE and is_adult(10) still returns FALSE — the boundary value 18 is never tested, so the test suite cannot tell >= from >.

#' tests/testthat/test-is_adult.R
test_that("is_adult returns TRUE for adults", {
  # ✔ Kills mutant 2 (<=): is_adult(25) returns FALSE
  # 🟢 Doesn't kill mutant 1 (>): is_adult(25) still returns TRUE
  expect_true(is_adult(25))
})

test_that("is_adult returns FALSE for minors", {
  # 🟢 Doesn't kill mutant 1 (>): is_adult(10) still returns FALSE
  # 🟢 Doesn't kill mutant 2 (<=): is_adult(10) returns TRUE → killed by first test anyway
  expect_false(is_adult(10))
})

We have killed 1 mutant out of 2, so the mutation score is 50%. The survivor tells us exactly what to fix — add a test at the boundary:

test_that("is_adult returns TRUE at the boundary age", {
  expect_true(is_adult(18))  # kills mutant 1: age > 18 returns FALSE for age = 18
})

With this test added the score reaches 100%.

Available mutators

A mutator describes one kind of code change. Pass a list of mutators to muttest_plan() to control what gets mutated.

Individual mutators

Function Description Example
operator() Mutate a binary operator operator("+", "-"): a + ba - b
boolean_literal() Mutate a boolean literal boolean_literal("TRUE", "FALSE"): TRUEFALSE
na_literal() Mutate an NA or NULL literal na_literal("NA", "NULL"): NANULL
call_name() Mutate a function call name call_name("any", "all"): any(x)all(x)
string_empty() Mutate non-empty string literals to the empty string string_empty(): "hello"""
string_fill() Mutate the empty string literal to a placeholder string string_fill(): """mutant"
numeric_increment() Increment numeric literals numeric_increment(): 56
numeric_decrement() Decrement numeric literals numeric_decrement(): 54
index_increment() Increment subscript indices index_increment(): x[i]x[i + 1L]
index_decrement() Decrement subscript indices index_decrement(): x[i]x[i - 1L]
negate_condition() Negate the condition of if/while statements negate_condition(): if (x > 0)if (!(x > 0))
remove_condition_negation() Remove negation from the condition of if/while statements remove_condition_negation(): if (!done)if (done)
remove_negation() Remove logical negation remove_negation(): !is.na(x)is.na(x)
replace_return_value() Replace the value in explicit return() calls replace_return_value(): return(x)return(NULL)

Preset collections — return a ready-made list of mutators

Function Description Example
arithmetic_operators() Arithmetic operator mutators +↔︎-, *↔︎/, ^*, %%*, %/%/
comparison_operators() Comparison operator mutators <↔︎>, ==↔︎!=, <<=, >>=
logical_operators() Logical operator mutators &&↔︎||, &↔︎|
boolean_literals() Boolean literal mutators TRUE↔︎FALSE, T↔︎F
na_literals() NA and NULL literal mutators NA↔︎NULL, NA↔︎NA_real_, NA↔︎NA_integer_, NA↔︎NA_character_
numeric_literals() Numeric literal mutators 56, 54
index_mutations() Index mutation mutators x[i]x[i + 1L], x[i]x[i - 1L]
string_literals() String literal mutators "hello""", """mutant"
condition_mutations() Condition mutation mutators if (x)if (!(x)), if (!x)if (x)

Where to go next