introduction

{r setup, include = FALSE} knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE )

Geographic Data Transformation Auditing: Why Variable Agnosticism Matters

The Problem in Population Health

For the first hop, we construct a ZCTA–ZIP association table by expanding the ZIP→ZCTA relationship file (i.e., grouping ZIPs by their assigned ZCTA). This produces a one-to-many mapping (ZCTA→{ZIPs}) that reflects common “lookup-style” workflows in applied settings.

This construction does not imply bidirectionality (it is not a valid inverse crosswalk) and does not encode proportional allocation. It is used solely to quantify how a typical boundary-translation workflow can alter aggregate estimates under an explicit allocation rule.

Decision Points Framework

Because st_intersects() is doing a literal spatial test: • If a ZCTA polygon touches Hennepin even a little, it counts. • That includes edge-touching, slivers, water boundaries, and weird TIGER topology. • The “relationship file” is not the same thing. It’s a curated linkage that reflects Census tabulation logic, not raw polygon contact.

Variable Agnosticism Design