| Title: | Clean and Harmonise 'Malawi Integrated Household Survey' Data |
| Version: | 0.2.1 |
| Description: | An offline suite of tools to clean, aggregate, and harmonise data from the 'Malawi Integrated Household Survey' ('IHS'). Provides crop-specific unit conversions, stratified winsorization, and automatic cross-round harmonisation for complex survey designs. |
| License: | MIT + file LICENSE |
| Depends: | R (≥ 4.1.0) |
| URL: | https://github.com/vituk123/ihsMW |
| BugReports: | https://github.com/vituk123/ihsMW/issues |
| Imports: | dplyr (≥ 1.1.0), readr (≥ 2.1.0), rlang (≥ 1.1.0), cli (≥ 3.6.0) |
| Suggests: | srvyr (≥ 1.2.0), survey (≥ 4.2.0), testthat (≥ 3.0.0), usethis (≥ 2.2.0), pkgdown (≥ 2.0.0), knitr (≥ 1.40), rmarkdown (≥ 2.20), withr (≥ 2.5.0), jsonlite (≥ 1.8.0) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Language: | en-US |
| NeedsCompilation: | no |
| Packaged: | 2026-06-04 18:35:40 UTC; vitumbikokayuni |
| Author: | Vitumbiko Kayuni |
| Maintainer: | Vitumbiko Kayuni <vitumbikokayuni@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-04 20:30:02 UTC |
Smart Aggregation to Household Level
Description
Automatically detects variable types and applies sensible aggregations (e.g., 'sum' for continuous quantities, 'max' or logical OR for dummies). Throws warnings for ambiguous columns rather than failing silently.
Usage
ihs_aggregate(data, group_col = "case_id")
Arguments
data |
A data.frame at the individual or plot level |
group_col |
The column name identifying the household (e.g., "case_id" or "y4_hhid") |
Value
A data.frame aggregated to the household level
Clean and Harmonise IHS Data
Description
This wrapper function applies standard cleaning procedures to Malawi IHS data. It handles missing value conversions, winsorization of continuous variables, and returns an audit log of all transformations applied.
Usage
ihs_clean(
data,
winsorize_vars = NULL,
winsorize_by = NULL,
probs = c(0.01, 0.99)
)
Arguments
data |
A data.frame (typically loaded from a '.dta' file) |
winsorize_vars |
Character vector of continuous variables to winsorize (e.g., consumption, harvest) |
winsorize_by |
Optional character string of a grouping variable (e.g., region) for stratified winsorization |
probs |
Numeric vector of length 2 specifying the lower and upper quantiles for winsorization. Default is 'c(0.01, 0.99)'. |
Value
A data.frame with cleaning applied. The returned object has an 'ihs_audit' attribute containing a log of modifications.
Convert Agricultural Units to Kilograms
Description
Converts reported harvest units (e.g., Pails, Oxcarts, Heaps) into standard kilograms using official NSO crop-specific conversion factors.
Usage
ihs_convert_units(data, qty_col, unit_col, crop_col, unmapped = "warn")
Arguments
data |
A data.frame |
qty_col |
The name of the column containing the quantity |
unit_col |
The name of the column containing the unit code or name |
crop_col |
The name of the column containing the crop code |
unmapped |
Action to take when a unit cannot be mapped: '"warn"' (default), '"error"', or '"ignore"'. |
Value
A data.frame with a new qty_col_kg column.
Check the comparability of variables across IHS rounds
Description
Evaluates the completeness and comparability of variables across the available IHS rounds (IHS2, IHS3, IHS4, IHS5) using the bundled crosswalk.
Usage
ihs_crosswalk_check(verbose = TRUE)
Arguments
verbose |
Logical. If |
Value
A tibble containing the full crosswalk. If verbose
is TRUE, also prints a summary.
Examples
## Not run:
# Check the crosswalk and print a report
cw <- ihs_crosswalk_check()
## End(Not run)
Harmonise Raw IHS Data
Description
Takes a raw data.frame loaded from a Malawi IHS survey round (e.g. from a '.dta' file) and renames its columns to the standard harmonised variable names defined in the crosswalk.
Usage
ihs_harmonise(data, round = "IHS5", extra = FALSE)
Arguments
data |
A data.frame, typically read from a '.dta' file using |
round |
A character string specifying the IHS round (e.g., |
extra |
Logical. If FALSE (default), drops columns that are not in the harmonisation crosswalk or standard ID columns. If TRUE, keeps all original columns. |
Value
A data.frame with columns renamed to standard 'harmonised_name's where applicable.
Search across all IHS rounds for variables manually mapped
Description
Searches the manual harmonisation crosswalk bundled within ihsMW for specific variables.
Usage
ihs_search(keyword, round = NULL, fields = c("name", "label", "module"))
Arguments
keyword |
A single search string to find (case-insensitive). |
round |
Limits search to a specific round. Valid inputs are |
fields |
A character vector of fields to include in the search. Valid fields are |
Value
A tibble with cross-round harmonised search results.
Examples
ihs_search("consumption")
ihs_search("expenditure", round = "IHS5")
ihs_search("age", fields = c("name", "label"))
Standardize Survey Missing Codes
Description
Converts common negative missing codes (like -99 for "Refused" or -98 for "Don't Know") into standard R 'NA' values to prevent them from skewing numeric calculations.
Usage
ihs_standardize_missing(data)
Arguments
data |
A data.frame |
Value
A data.frame with missing values standardized
Winsorize Continuous Variables
Description
Caps extreme outliers at specified percentiles. Crucially, this function allows for stratified winsorization (e.g., by region) to avoid over-trimming poor/rich areas, and it creates new '_w' suffixed columns to preserve raw data provenance.
Usage
ihs_winsorize(data, vars, by = NULL, probs = c(0.01, 0.99))
Arguments
data |
A data.frame |
vars |
Character vector of column names to winsorize |
by |
Optional grouping variable name (e.g., "region") for stratified thresholds |
probs |
Numeric vector of lower and upper quantiles. Default 'c(0.01, 0.99)' |
Value
A data.frame with new '*_w' columns added.