---
title: "Audit workflow"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Audit workflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE
)
```

## 2) Libraries (minimal)

```{r libs, message=FALSE, warning=FALSE}
library(geoDeltaAudit)
library(dplyr)
library(readr)
library(stringr)
library(janitor)
```
# Geographic Data Transformation Audit Workflow

## Overview
This workflow measures data perturbation when transforming variables across geographic boundaries. The process reveals two critical decision points typically left implicit in applied work, maintaining variable agnosticism throughout.

## Setup and Data Preparation
## Step A: Association Construction
```{r assoc-build, message=FALSE, warning=FALSE}
## --- load toy baseline (relationship-defined) ---
acs_path <- system.file("extdata", "toy_acs_zcta_hennepin.csv", package = "geoDeltaAudit")
stopifnot(nchar(acs_path) > 0)

acs_zcta_hennepin <- readr::read_csv(acs_path, show_col_types = FALSE) %>%
  janitor::clean_names() %>%
  dplyr::mutate(zcta = stringr::str_pad(as.character(.data$zcta), 5, pad = "0"))

# Toy assoc: 1:1 ZCTA -> ZIP (same 5-digit IDs)
zcta_zip_hennepin <- acs_zcta_hennepin %>%
  dplyr::distinct(.data$zcta) %>%
  dplyr::transmute(zcta = .data$zcta, zip = .data$zcta) %>%
  dplyr::distinct()

assoc_structure <- zcta_zip_hennepin %>%
  dplyr::summarise(
    n_rows  = dplyr::n(),
    n_zctas = dplyr::n_distinct(.data$zcta),
    n_zips  = dplyr::n_distinct(.data$zip)
  )

assoc_structure
```

### Association diagnostics
diagnostics <- audit_association(assoc_table)
print(diagnostics)

```{r assoc-diagnostics, message=FALSE, warning=FALSE}
unmapped <- acs_zcta_hennepin %>%
  dplyr::anti_join(zcta_zip_hennepin %>% dplyr::distinct(.data$zcta), by = "zcta")

fanout_stats <- zcta_zip_hennepin %>%
  dplyr::count(.data$zcta, name = "n_zip") %>%
  dplyr::summarise(
    min    = min(.data$n_zip),
    median = median(.data$n_zip),
    mean   = mean(.data$n_zip),
    max    = max(.data$n_zip)
  )

list(
  n_unmapped_zctas = nrow(unmapped),
  fanout = fanout_stats
)
```
# Key assumption: 
Crosswalks are directional allocations (not inverses)
This audit treats each step as a one-way transformation and reports loss/fan-out at each stage


## Interpreting Results Tables

```{r hennepin-pngs, echo=FALSE, message=FALSE, warning=FALSE, results="asis"}
knitr::include_graphics(c(
  "baseline_hennepin.png",
  "hennepin_relationship.png"
))
```


## Visualizing Perturbations


## What this vignette demonstrates

This vignette shows how `geoDeltaAudit` separates **data values** from **geographic transformation rules**.

The maps above visualize how identical source values can yield different spatial memberships depending on whether boundaries are defined by relationships or geometry. The numerical audit steps in other vignettes quantify the downstream effects of these choices.



This vignette shows how `geoDeltaAudit` separates **data values** from **geographic transformation rules**.

The maps above visualize how identical source values can yield different spatial memberships depending on whether boundaries are defined by relationships or geometry. The numerical audit steps in other vignettes quantify the downstream effects of these choices.

This vignette is intentionally visual and descriptive. It does not perform transformations or inference.
