Help for package unicefData

Title:

Download Indicators from UNICEF Data Warehouse

Version:

2.3.0

Description:

An R client to fetch SDMX (Statistical Data and Metadata eXchange) CSV series from the UNICEF Data Warehouse https://data.unicef.org/. Part of a trilingual suite also available for 'Python' and 'Stata'. Features include automatic pagination, caching with memoisation, country name lookups, metadata versioning (vintages), and comprehensive indicator support for SDG (Sustainable Development Goals) monitoring.

Depends:

R (≥ 3.5.0)

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.3

Imports:

httr, readr, dplyr, tibble, xml2, memoise, countrycode, yaml, tools, jsonlite, magrittr, purrr, rlang, digest, tidyr

Suggests:

testthat (≥ 3.0.0), knitr, rmarkdown

VignetteBuilder:

knitr

Config/testthat/edition:

URL:

https://github.com/unicef-drp/unicefData, https://jpazvd.github.io/

BugReports:

https://github.com/unicef-drp/unicefData/issues

NeedsCompilation:

Packaged:

2026-02-25 02:33:03 UTC; jpazevedo

Author:

Joao Pedro Azevedo [aut, cre], Lucas Rodrigues [ctb], Yang Liu [ctb], Karen Avanesian [ctb]

Maintainer:

Joao Pedro Azevedo <jpazevedo@unicef.org>

Repository:

CRAN

Date/Publication:

2026-03-03 21:20:08 UTC

Store Schema in Cache

Description

Store Schema in Cache

Usage

.cache_schema(key, value, namespace = "schema")

Arguments

key

Cache key (typically indicator name)

value

Schema object to cache

namespace

Additional namespace for organization (optional)

Value

Invisibly returns the cached value

Create a vintage snapshot of current metadata

Description

Copies current metadata files to a dated vintage directory for historical reference and rollback capability.

Usage

.create_vintage(vintage_date, results, verbose = TRUE)

Arguments

vintage_date

Character string of vintage date (YYYY-MM-DD format)

results

List of sync results with counts

verbose

Logical for progress messages

Value

Invisible path to vintage directory

Create watermarked metadata structure

Description

Create watermarked metadata structure

Usage

.create_watermarked(
  content_type,
  source_url,
  content,
  counts,
  extra_metadata = NULL
)

Arguments

content_type

Type of content

source_url

Source URL

content

Content list

counts

Count statistics

extra_metadata

Additional metadata fields

Value

List with _metadata watermark and content

Create a list with standardized watermark header (matches Python format)

Description

Create a list with standardized watermark header (matches Python format)

Usage

.create_watermarked_list(
  content_type,
  source_url,
  content,
  counts,
  extra_metadata = NULL
)

Arguments

extra_metadata

Optional list of additional metadata fields (e.g., codelist_name)

Ensure indicator cache is loaded

Description

Ensure indicator cache is loaded

Usage

.ensure_cache_loaded(force_refresh = FALSE)

Arguments

force_refresh

Logical. If TRUE, always fetch fresh data

Value

Named list of indicator metadata

Fetch codelist from SDMX API

Description

Retrieves a codelist by ID and returns structured code information.

Usage

.fetch_codelist(codelist_id)

.fetch_codelist(codelist_id)

Arguments

codelist_id

Codelist ID

Value

List with codelist name and codes, or NULL on error

List with codes and metadata

Fetch indicator codelist from UNICEF SDMX API

Description

Fetch indicator codelist from UNICEF SDMX API

Usage

.fetch_indicator_codelist()

Value

Named list of indicator metadata

Fetch data from a single dataflow with 404 detection

Description

Low-level helper that fetches indicator data from a specific dataflow. Returns status "not_found" for 404 errors (allowing fallback to other dataflows) or throws for other errors.

Usage

.fetch_one_flow(
  indicator,
  dataflow,
  countries = NULL,
  start_year_str = NULL,
  end_year_str = NULL,
  max_retries = 3,
  version = "1.0",
  page_size = 1e+05,
  verbose = TRUE,
  totals = FALSE,
  labels = "id"
)

Arguments

indicator

Character vector of indicator codes

dataflow

Character string of dataflow ID

countries

Character vector of ISO3 country codes (optional)

start_year_str

Character string of start year (optional)

end_year_str

Character string of end year (optional)

max_retries

Integer number of retry attempts

version

SDMX version string (default "1.0")

page_size

Integer number of rows per page

verbose

Logical for progress messages

totals

Logical for including totals

labels

Label format ("id" or "name")

Value

List with status ("ok" or "not_found") and df (data.frame or NULL)

Fetch with retries

Description

Fetch with retries

Usage

.fetch_with_retry(url, max_retries = 3)

Arguments

url

URL to fetch

max_retries

Number of retries

Value

Response content or NULL

Fetch XML content from URL with retry logic

Description

Fetch XML content from URL with retry logic

Usage

.fetch_xml(url, retries = 3L)

Arguments

url

Character string of URL to fetch

retries

Integer number of retry attempts (default 3)

Value

Character string of XML content

Find metadata directory

Description

Find metadata directory

Usage

.find_metadata_dir()

Get basic dataflow info from _unicefdata_dataflows.yaml

Description

Get basic dataflow info from _unicefdata_dataflows.yaml

Usage

.get_basic_dataflow_info(dataflow, metadata_dir)

Get path to the indicator cache file

Description

Uses temporary session cache by default (tempdir()) to avoid writing to user's home directory without permission. For persistent caching, set UNICEF_DATA_HOME_R or UNICEF_DATA_HOME environment variable to explicitly specify a cache location.

Usage

.get_cache_path()

Retrieve Schema from Cache

Description

Retrieve Schema from Cache

Usage

.get_cached_schema(key, namespace = "schema")

Arguments

key

Cache key

namespace

Cache namespace

Value

Schema object if found, NULL otherwise

Get fallback indicator definitions

Description

Get fallback indicator definitions

Get fallback indicators if shared config not available

Usage

.get_fallback_indicators()

.get_fallback_indicators()

Value

List of common indicators

Get fallback dataflow sequences (lazy loading)

Description

Returns cached fallback sequences, loading from YAML on first access. Used for dataflow discovery when indicator is not in primary dataflow.

Usage

.get_fallback_sequences()

Value

Named list of fallback sequences by prefix, or NULL if not available

Get path to fallback sequences file

Description

Looks for canonical _dataflow_fallback_sequences.yaml Uses multiple fallback strategies for development and installed package scenarios

Usage

.get_fallback_sequences_path()

Get metadata directory path

Description

Get metadata directory path

Usage

.get_metadata_dir()

Value

Path to R/metadata/current/

Infer organizational CATEGORY from indicator code prefix

Description

DEPRECATED FOR DATAFLOW DETECTION: This function infers category for organizational grouping only. For actual API dataflow detection, use get_dataflow_for_indicator() which implements the canonical fallback sequence from _dataflow_fallback_sequences.yaml.

Usage

.infer_category(indicator_code)

Arguments

indicator_code

Character. The indicator code

Details

Note: Category != Dataflow. For example, WS_HCF_* indicators have category='WASH' but dataflow='WASH_HEALTHCARE_FACILITY'.

Value

Character. The inferred organizational category name

Check if cache is stale

Description

Check if cache is stale

Usage

.is_cache_stale(last_updated)

Arguments

last_updated

POSIXct. When cache was last updated

Value

Logical. TRUE if cache should be refreshed

Check if Schema is Cached

Description

Check if Schema is Cached

Usage

.is_cached(key, namespace = "schema")

Arguments

key

Cache key

namespace

Cache namespace

Value

Logical TRUE if schema is cached

Check if an error is an HTTP 404 response

Description

Check if an error is an HTTP 404 response

Usage

.is_http_404(e)

Arguments

e

An error condition object

Value

TRUE if the error represents a 404 Not Found, FALSE otherwise

Load cached indicator metadata

Description

Load cached indicator metadata

Usage

.load_cache()

Value

List with 'indicators' and 'last_updated' or NULL

Load fallback sequences from canonical YAML

Description

Loads dataflow fallback sequences used by all platforms for consistent indicator-to-dataflow resolution. Returns a named list where names are indicator prefixes and values are character vectors of dataflows to try.

Usage

.load_fallback_sequences()

Value

Named list of fallback sequences by indicator prefix

Load fallback dataflow sequences from canonical YAML (shared with Python/Stata)

Description

Reads _dataflow_fallback_sequences.yaml from the workspace root or package metadata. This ensures all languages (Stata, Python, R) use identical dataflow resolution logic.

Usage

.load_fallback_sequences_yaml()

Value

List with fallback sequences by indicator prefix

Load comprehensive indicators metadata from canonical YAML file

Description

Enables direct dataflow lookup by indicator code instead of using prefix-based fallback sequences. Much faster (O(1) vs trying multiple dataflows).

Usage

.load_indicators_metadata_yaml()

Value

List with indicators metadata (indicator code -> dataflow mapping, etc.)

Load aggregate/region ISO3 codes for geo_type classification

Description

Reads _unicefdata_regions.yaml to get codes for regions, income groups, and other aggregates. Returns a set of ISO3 codes for use in geo_type derivation. This ensures parity with Stata and Python implementations.

Usage

.load_region_codes_yaml()

Value

Character vector of ISO3 codes that are aggregates

Load shared indicators from config/common_indicators.yaml

Description

This ensures consistency across Python, R, and Stata platforms.

Usage

.load_shared_indicators()

Value

List of indicators or NULL if not found

Load YAML file from current metadata directory

Description

Load YAML file from current metadata directory

Usage

.load_yaml(filename)

Arguments

filename

Character string of filename (without path)

Value

List from parsed YAML, or empty list if file not found

Load YAML file from absolute path

Description

Load YAML file from absolute path

Usage

.load_yaml_from_path(filepath)

Arguments

filepath

Character string of absolute file path

Value

List from parsed YAML, or empty list if file not found

Parse SDMX codelist XML response

Description

Parse SDMX codelist XML response

Usage

.parse_codelist_xml(xml_content)

Arguments

xml_content

Character. Raw XML content from API

Value

Named list of indicator metadata

Resolve indicator category through multiple fallback strategies

Description

Tries: explicit category field -> parent code -> prefix-based inference. Used by list_categories() for accurate category counts.

Usage

.resolve_indicator_category(indicator_code, info)

Arguments

indicator_code

Character. The indicator code

info

Named list. The indicator metadata from cache

Value

Character. Resolved category name

Save indicator metadata to cache file

Description

Save indicator metadata to cache file

Usage

.save_cache(indicators)

Arguments

indicators

Named list of indicator metadata

Save data to YAML file in current metadata directory

Description

Writes data to a YAML file without line wrapping for cross-platform consistency.

Usage

.save_yaml(filename, data, output_dir)

.save_yaml(filename, data, output_dir)

Arguments

filename

Filename

data

Data to save

output_dir

Output directory

Value

Invisible filepath of saved file

Update sync history with new vintage entry

Description

Adds a new entry to the sync history file tracking metadata synchronizations.

Usage

.update_sync_history(vintage_date, results)

Arguments

vintage_date

Character string of vintage date (YYYY-MM-DD format)

results

List of sync results with counts and timestamps

Value

Invisible NULL

Convert R list to YAML without line wrapping

Description

Convert R list to YAML without line wrapping

Usage

.yaml_no_wrap(data, indent = 0)

.yaml_no_wrap(data, indent = 0)

Arguments

data

List to convert

indent

Current indentation level

Value

Character vector of YAML lines

Convert scalar value to YAML string

Description

Convert scalar value to YAML string

Usage

.yaml_scalar(x)

.yaml_scalar(x)

Arguments

x

Scalar value

Value

Character string in YAML format

Add country-level metadata columns

Description

Add country-level metadata columns

Usage

add_country_metadata(df, metadata_list)

Add indicator-level metadata columns

Description

Add indicator-level metadata columns

Usage

add_indicator_metadata(df, metadata_list)

Apply circa matching to find closest available years

Description

For each country, find observations closest to the target year(s). Different countries may have different actual years in the result.

Usage

apply_circa(df, target_years)

Arguments

df

Data frame with iso3, period, value columns

target_years

Vector of target years to match

Value

Data frame with observations closest to each target year

Apply format transformation

Description

Apply format transformation

Usage

apply_format(df, format, pivot = NULL)

Apply latest value filter

Description

Apply latest value filter

Usage

apply_latest(df)

Apply Most Recent Values filter

Description

Apply Most Recent Values filter

Usage

apply_mrv(df, n)

Build indicator catalog

Description

Builds indicator catalog from common SDG indicators. Tries to load from shared config/indicators.yaml first, falls back to hardcoded definitions if not found.

Usage

build_indicator_catalog(verbose = TRUE, use_shared_config = TRUE)

Arguments

verbose

Print progress messages

use_shared_config

Try to load from shared YAML config (default: TRUE)

Value

List with indicator metadata

Clean and Standardize UNICEF Data

Description

Renames columns and converts types.

Usage

clean_unicef_data(df)

Arguments

df

Data frame to clean.

Value

A cleaned data frame with standardized column names and types.

Clear the cached configuration

Description

Clear the cached configuration

Usage

clear_config_cache()

Value

Invisible NULL.

Clear the Schema Cache

Description

Remove all cached schemas from memory to free resources or refresh data.

Usage

clear_schema_cache()

Value

Invisibly returns NULL. Prints confirmation message.

Examples


  clear_schema_cache()
  # Cache: 0 items (0 MB)

Clear All UNICEF Caches

Description

Resets all in-memory caches across the package: indicator metadata, fallback sequences, region codes, schema cache, and config cache. After clearing, the next API call will reload all metadata from YAML files (or fetch fresh from the API if file cache is stale).

Usage

clear_unicef_cache(reload = TRUE, verbose = TRUE)

Arguments

reload

Logical. If TRUE (default), immediately reload YAML-based caches (indicators metadata, fallback sequences, region codes). If FALSE, caches are cleared but not reloaded until next use.

verbose

Logical. If TRUE, print what was cleared.

Value

Invisibly returns a named list of cleared cache names.

Examples


  # Clear everything and reload
  clear_unicef_cache()

  # Clear without reloading (lazy reload on next use)
  clear_unicef_cache(reload = FALSE)

Compare two metadata vintages

Description

Compares dataflows between two vintages to identify additions, removals, and modifications.

Usage

compare_vintages(vintage1, vintage2 = NULL, cache_dir = NULL)

Arguments

vintage1

Earlier vintage date (YYYY-MM-DD)

vintage2

Later vintage date (YYYY-MM-DD) or NULL for current

cache_dir

Optional cache directory path

Value

List with added, removed, and changed items

Examples

## Not run: 
# Compare historical vintage to current
changes <- compare_vintages("2025-11-15")

# Compare two historical vintages
changes <- compare_vintages("2025-10-01", "2025-11-15")

if (length(changes$added) > 0) {
  message(sprintf("New dataflows: %s", paste(changes$added, collapse = ", ")))
}

## End(Not run)

Compute hash of data frame for version tracking

Description

Compute hash of data frame for version tracking

Usage

compute_data_hash(df)

Arguments

df

Data frame to hash

Value

Character hash string (16 characters)

Create version record for a downloaded dataset

Description

Create version record for a downloaded dataset

Usage

create_data_version(df, indicator_code, version_id = NULL, notes = NULL)

Arguments

df

Downloaded data frame

indicator_code

Indicator code

version_id

Optional version identifier

notes

Optional notes about this version

Value

List with version metadata

Get dataflow schema information

Description

Display the dimensions and attributes for a UNICEF dataflow. Reads from local YAML schema files in metadata/current/dataflows/.

Usage

dataflow_schema(dataflow, metadata_dir = NULL)

Arguments

dataflow

Character. The dataflow ID (e.g., "CME", "EDUCATION").

metadata_dir

Optional path to metadata directory. Auto-detected if NULL.

Value

A list with components: id, name, version, agency, dimensions, attributes.

Examples


# Get schema for Child Mortality dataflow
schema <- dataflow_schema("CME")
print(schema$dimensions)
print(schema$attributes)

Detect Dataflow from Indicator

Description

Auto-detects the correct dataflow for a given indicator code.

Usage

detect_dataflow(indicator)

Arguments

indicator

Indicator code (e.g. "CME_MRY0T4")

Value

Character string of dataflow ID

Ensure metadata is synced and fresh

Description

Checks if metadata exists and is within max_age_days. If not, performs a sync automatically.

Usage

ensure_metadata(max_age_days = 30, verbose = FALSE, cache_dir = NULL)

Arguments

max_age_days

Maximum age in days before re-sync (default: 30)

verbose

Print messages

cache_dir

Optional cache directory path

Value

Logical indicating if sync was performed

Examples


# Check every 30 days (default)
ensure_metadata()

# Check every 7 days
ensure_metadata(max_age_days = 7)

Fetch SDMX content as text

Description

Fetch SDMX content as text

Usage

fetch_sdmx_text(url, ua = .unicefData_ua, retry)

Arguments

url

URL to fetch

ua

User agent string

retry

Number of retries

Value

Content as text

Fetch with retries

Description

Fetch with retries

Usage

fetch_with_retry(url, max_retries = 3)

Arguments

url

URL to fetch

max_retries

Number of retries

Value

Response object or NULL

Filter UNICEF Data (Sex, Age, Wealth, etc.)

Description

Filters data to specific disaggregations or defaults to totals. Uses indicator metadata (disaggregations_with_totals) to determine which dimensions have _T totals and should be filtered by default.

Usage

filter_unicef_data(
  df,
  sex = NULL,
  age = NULL,
  wealth = NULL,
  residence = NULL,
  maternal_edu = NULL,
  verbose = TRUE,
  indicator_code = NULL,
  dataflow = NULL
)

Arguments

df

Data frame to filter.

sex

Character string for sex filter (e.g. "F", "M", "_T").

age

Character string for age filter.

wealth

Character string for wealth quintile filter.

residence

Character string for residence filter.

maternal_edu

Character string for maternal education filter.

verbose

Logical, print progress messages.

indicator_code

Optional indicator code to enable metadata-driven filtering. Placed at end to preserve backward compatibility with existing positional calls.

dataflow

Optional dataflow name for dataflow-specific filtering logic. For NUTRITION dataflow, age defaults to Y0T4 instead of _T.

Value

A filtered data frame matching the specified disaggregation criteria.

Get Cache Info

Description

Get information about the current cache state.

Usage

get_cache_info()

Value

Named list with cache metadata

Examples


info <- get_cache_info()
print(info$cache_path)
print(info$indicator_count)

Get cached configuration (loads once, reuses thereafter)

Description

Get cached configuration (loads once, reuses thereafter)

Usage

get_cached_config(config_path = NULL)

Arguments

config_path

Optional explicit path to config file

Value

Full configuration list

Get metadata for a specific codelist

Description

Get metadata for a specific codelist

Usage

get_codelist_meta(codelist_id)

Arguments

codelist_id

Codelist identifier

Value

List with codelist metadata or NULL

Get path to the shared indicators.yaml config file

Description

Searches in order:

UNICEF_CONFIG_PATH environment variable
../../config/indicators.yaml relative to this file
./config/indicators.yaml relative to current working directory

Usage

get_config_path()

Value

Path to indicators.yaml

Get ISO3 to continent mapping

Description

Get ISO3 to continent mapping

Usage

get_continents()

Get ISO3 to UNICEF region mapping

Description

Get ISO3 to UNICEF region mapping

Usage

get_country_regions()

Get current metadata directory

Description

Get current metadata directory

Usage

get_current_dir()

Value

Path to current/ subdirectory

Get Dataflow for Indicator

Description

Returns the dataflow (category) for a given indicator code. This function automatically loads the indicator cache on first use, fetching from the UNICEF SDMX API if necessary.

Usage

get_dataflow_for_indicator(indicator_code, default = "GLOBAL_DATAFLOW")

Arguments

indicator_code

Character. UNICEF indicator code (e.g., "CME_MRY0T4")

default

Character. Default dataflow if indicator not found (default: "GLOBAL_DATAFLOW")

Details

IMPORTANT: Known dataflow overrides are checked FIRST, before the cache. This ensures problematic indicators (where the API metadata is wrong) always get the correct dataflow.

Value

Character. Dataflow name (e.g., "CME", "NUTRITION", "EDUCATION")

Examples


get_dataflow_for_indicator("CME_MRY0T4")
# Returns: "CME"

get_dataflow_for_indicator("NT_ANT_HAZ_NE2_MOD")
# Returns: "NUTRITION"

get_dataflow_for_indicator("ED_CR_L1_UIS_MOD")
# Returns: "EDUCATION_UIS_SDG" (uses override, not wrong cache value)

Get list of all UNICEF dataflows

Description

Get list of all UNICEF dataflows

Usage

get_dataflow_list(max_retries = 3)

Arguments

max_retries

Number of retries

Value

Tibble with dataflow info

Get metadata for a specific dataflow

Description

Get metadata for a specific dataflow

Usage

get_dataflow_meta(dataflow_id)

Arguments

dataflow_id

Dataflow identifier

Value

List with dataflow metadata or NULL

Get schema for a specific dataflow

Description

Get schema for a specific dataflow

Usage

get_dataflow_schema(dataflow_id, version = "1.0", max_retries = 3)

Arguments

dataflow_id

Dataflow ID

version

Dataflow version

max_retries

Number of retries

Value

List with dimensions, attributes, etc. or NULL

Get list of expected column names for a dataflow

Description

Get list of expected column names for a dataflow

Usage

get_expected_columns(dataflow_id, metadata_dir = NULL)

Arguments

dataflow_id

Dataflow ID (e.g., 'CME', 'NUTRITION')

metadata_dir

Directory containing dataflow_schemas.yaml

Value

Character vector of column names (dimensions + time + attributes)

Get fallback dataflows for an indicator

Description

Returns alternative dataflows to try when the primary dataflow fails. Uses comprehensive indicators metadata for direct lookup, falling back to prefix-based sequences from canonical YAML.

Usage

get_fallback_dataflows(original_flow, indicator_code = NULL)

Arguments

original_flow

Character string of the original dataflow that failed

indicator_code

Optional indicator code for direct metadata lookup

Value

Character vector of fallback dataflow IDs to try

Get ISO3 to World Bank income group mapping

Description

Get ISO3 to World Bank income group mapping

Usage

get_income_groups()

Get all available indicator codes

Description

Get all available indicator codes

Usage

get_indicator_codes(
  category = NULL,
  sdg_goal = NULL,
  dataflow = NULL,
  config_path = NULL
)

Arguments

category

Optional: filter by category

sdg_goal

Optional: filter by SDG goal

dataflow

Optional: filter by dataflow

config_path

Optional explicit path to config file

Value

Character vector of indicator codes

Get Indicator Info

Description

Returns full metadata for an indicator.

Usage

get_indicator_info(indicator_code)

Arguments

indicator_code

Character. UNICEF indicator code

Value

Named list with indicator metadata or NULL if not found

Examples


info <- get_indicator_info("CME_MRY0T4")
print(info$name)
# "Under-five mortality rate"

Get metadata for a specific indicator

Description

Get metadata for a specific indicator

Usage

get_indicator_meta(indicator_code)

Arguments

indicator_code

Indicator code

Value

List with indicator metadata or NULL

Get indicator codes by category

Description

Get indicator codes by category

Usage

get_indicators_by_category(category, config_path = NULL)

Arguments

category

Category name (e.g., 'mortality', 'nutrition')

config_path

Optional explicit path to config file

Value

Character vector of indicator codes

Get indicator codes by dataflow

Description

Get indicator codes by dataflow

Usage

get_indicators_by_dataflow(dataflow, config_path = NULL)

Arguments

dataflow

Dataflow name (e.g., 'CME', 'NUTRITION')

config_path

Optional explicit path to config file

Value

Character vector of indicator codes

Get indicator codes by SDG goal

Description

Get indicator codes by SDG goal

Usage

get_indicators_by_sdg(sdg_goal, config_path = NULL)

Arguments

sdg_goal

SDG goal number (e.g., '3', '4')

config_path

Optional explicit path to config file

Value

Character vector of indicator codes

Get metadata cache directory

Description

Get metadata cache directory

Usage

get_metadata_cache()

Value

Path to cache directory

Get R package root directory

Description

Attempts to locate the root of the R package by checking for specific files (unicefData.R or DESCRIPTION) in the current directory, R/ subdirectory, or parent directories.

Usage

get_package_root()

Value

Character path to the package root.

Get sample data from a dataflow to extract values

Description

Get sample data from a dataflow to extract values

Usage

get_sample_data(
  dataflow_id,
  max_rows = 10000,
  max_retries = 3,
  exhaustive_cols = NULL
)

Arguments

dataflow_id

Dataflow ID

max_rows

Maximum rows to fetch

max_retries

Number of retries

exhaustive_cols

Columns to extract ALL values for

Value

Named list mapping column names to value statistics

Get Schema Cache Information

Description

Display current cache contents and statistics.

Usage

get_schema_cache_info()

Value

Invisible data.frame with cache statistics

Examples


  get_schema_cache_info()

Fetch SDMX data or structure from any agency

Description

Download one or more SDMX flows from a specified agency, with paging, retries, caching, format & labels options, and post-processing.

Schemas are cached in memory per session for performance: subsequent indicators from the same dataflow load 8-17x faster (2.2s –> 0.13s).

Usage

get_sdmx(
  agency = "UNICEF",
  flow,
  key = NULL,
  start_period = NULL,
  end_period = NULL,
  nofilter = FALSE,
  detail = c("data", "structure"),
  version = NULL,
  format = c("csv", "sdmx-xml", "sdmx-json"),
  labels = c("id", "both", "none"),
  tidy = TRUE,
  country_names = TRUE,
  page_size = 100000L,
  retry = 3L,
  cache = FALSE,
  sleep = 0.2,
  post_process = NULL
)

Arguments

agency

Character agency ID (e.g., "UNICEF").

flow

Character vector of flow IDs; length >= 1.

key

Optional character vector of codes to filter the flow.

start_period

Optional single 4-digit year for start (e.g., 2000).

end_period

Optional single 4-digit year for end (e.g., 2020).

nofilter

Logical; if TRUE, fetch all disaggregations (no pre-fetch filtering); if FALSE (default), use efficient pre-fetch filtering (totals only per schema).

detail

One of "data" or "structure"; default "data".

version

Optional SDMX version; if NULL, auto-detected via list_sdmx_flows().

format

One of "csv", "sdmx-xml", "sdmx-json"; default "csv".

labels

One of "both","id","none"; default "both".

tidy

Logical; if TRUE, rename core columns and retain metadata; default TRUE.

country_names

Logical; if TRUE, join ISO3 to country names; default TRUE.

page_size

Rows per page for CSV; default 100000L.

retry

Number of retries; default 3L.

cache

Logical; if TRUE, cache per flow on disk; default FALSE.

sleep

Pause (in seconds) between pages; default 0.2.

post_process

Optional function to apply to raw tibble before tidy-up.

Value

A tibble (or list of tibbles) for data, or xml_document(s) for structure.

get_unicef

Description

Backward-compatible wrapper for unicefData(). Supports legacy parameters start_year/end_year and forwards additional options.

Usage

get_unicef(
  indicator,
  countries = NULL,
  start_year = NULL,
  end_year = NULL,
  year = NULL,
  dataflow = NULL,
  ignore_duplicates = FALSE,
  ...
)

Arguments

indicator

Character or vector of indicator codes

countries

Character vector of ISO3 codes (optional)

start_year

Integer start year (optional)

end_year

Integer end year (optional)

year

Character or integer specifying years (optional). If missing, constructed from start_year/end_year.

dataflow

Optional explicit dataflow ID

ignore_duplicates

Logical, removed duplicated rows after fetch

...

Additional arguments forwarded to unicefData()

Value

Tibble with standardized columns

Get path to a specific vintage

Description

Get path to a specific vintage

Usage

get_vintage_path(vintage = NULL, cache_dir = NULL)

Arguments

vintage

Vintage date (YYYY-MM-DD) or NULL for current

cache_dir

Optional cache directory path

Value

Path to vintage directory

Indicator Registry - Auto-sync UNICEF Indicator Metadata

Description

Key features:

Automatic download of indicator codelist from UNICEF SDMX API
Maps each indicator code to its dataflow (category)
Caches metadata locally in config/unicef_indicators_metadata.yaml
Supports offline usage after initial sync
Version tracking for cache freshness

Details

This module automatically fetches and caches the complete UNICEF indicator codelist from the SDMX API. The cache is created on first use and can be refreshed on demand.

Examples

# Auto-detect dataflow from indicator code
dataflow <- get_dataflow_for_indicator("CME_MRY0T4")
print(dataflow)  # "CME"

# Refresh cache manually
refresh_indicator_cache()

Description

List all available indicator categories (dataflows) with counts. Prints a formatted table of categories showing how many indicators are in each category.

Usage

list_categories()

Value

Invisibly returns a data.frame with category counts.

Examples


list_categories()

List available UNICEF SDMX dataflows

Description

Convenience wrapper around list_sdmx_flows() for parity with Python.

Usage

list_dataflows(
  agency = "UNICEF",
  retry = NULL,
  cache_dir = tools::R_user_dir("unicefdata", "cache"),
  max_retries = 3
)

Arguments

agency

Character agency ID (default "UNICEF").

retry

Integer. Number of retries for transient HTTP failures.

cache_dir

Directory for memoised cache.

max_retries

Integer. Number of retry attempts (default: 3). Alternative name for 'retry' parameter.

Value

A tibble with columns id, agency, version, name.

List Indicators

Description

List all known indicators, optionally filtered by dataflow or name.

Usage

list_indicators(dataflow = NULL, name_contains = NULL)

Arguments

dataflow

Character. Filter by dataflow/category (e.g., "CME", "NUTRITION")

name_contains

Character. Filter by name substring (case-insensitive)

Value

Named list of matching indicators

Examples


# Get all mortality indicators
mortality <- list_indicators(dataflow = "CME")

# Search by name
stunting <- list_indicators(name_contains = "stunting")

List SDMX codelist for a given agency and codelist identifier

Description

Download and cache the SDMX codelist definitions from a specified agency's REST endpoint.

Usage

list_sdmx_codelist(
  agency = "UNICEF",
  codelist_id,
  retry = 3L,
  cache_dir = tools::R_user_dir("get_sdmx", "cache")
)

Arguments

agency

Character agency ID (e.g., "UNICEF").

codelist_id

Character codelist identifier (e.g., "CL_UNICEF_INDICATOR").

retry

Number of retries for HTTP failures; default is 3.

cache_dir

Directory for on-disk cache; created if it does not exist.

Value

A tibble with columns code, description, and name.

List Available SDMX Flows for an Agency

Description

Download and cache the SDMX dataflow definitions from a specified agency's REST endpoint.

Usage

list_sdmx_flows(
  agency = "UNICEF",
  retry = 3L,
  cache_dir = tools::R_user_dir("unicefdata", "cache")
)

Arguments

agency

Character agency ID (e.g., "UNICEF").

retry

Number of retries for transient HTTP failures; default is 3.

cache_dir

Directory for on-disk cache; created if it does not exist.

Value

A tibble with columns id, agency, version, and name.

List SDMX codelist for a given flow + dimension

Description

List SDMX codelist for a given flow + dimension

Usage

list_unicef_codelist(
  flow,
  dimension,
  cache_dir = tools::R_user_dir("unicefData", "cache"),
  retry = 3
)

Arguments

flow

character flow ID, e.g. "NUTRITION"

dimension

character dimension ID within that flow, e.g. "INDICATOR"

cache_dir

Character path to cache directory.

retry

Integer, number of retries.

Value

A tibble with columns code and description

List Available UNICEF SDMX Flows

Description

Download and cache the SDMX data-flow definitions from the UNICEF REST endpoint.

Usage

list_unicef_flows(
  cache_dir = tools::R_user_dir("unicefData", "cache"),
  retry = 3
)

Arguments

cache_dir

Character path to cache directory.

retry

Integer, number of retries.

Value

A tibble with columns id, agency, and version

List available metadata vintages

Description

Returns dates of all metadata snapshots stored in the vintages/ directory. Vintages are sorted newest first.

Usage

list_vintages(cache_dir = NULL)

Arguments

cache_dir

Optional cache directory path

Value

Character vector of vintage dates (YYYY-MM-DD format)

Examples

## Not run: 
list_vintages()
# [1] "2025-12-02" "2025-11-15" "2025-10-01"

## End(Not run)

Load cached codelist metadata from YAML

Description

Load cached codelist metadata from YAML

Usage

load_codelists()

Value

List with codelist metadata

Load the full configuration from YAML

Description

Load the full configuration from YAML

Usage

load_config(config_path = NULL)

Arguments

config_path

Optional explicit path to config file

Value

Full configuration list

Load schema for a specific dataflow from cached YAML

Description

Load schema for a specific dataflow from cached YAML

Usage

load_dataflow_schema(dataflow_id, metadata_dir = NULL)

Arguments

dataflow_id

Dataflow ID (e.g., 'CME', 'NUTRITION')

metadata_dir

Directory containing dataflows/ subdirectory

Value

Schema list or NULL if not found

Load cached dataflow metadata from YAML

Description

Load cached dataflow metadata from YAML

Usage

load_dataflows()

Value

List with dataflow metadata

Load cached indicator metadata from YAML

Description

Load cached indicator metadata from YAML

Usage

load_indicators()

Value

List with indicator metadata

Load category definitions from shared config

Description

Load category definitions from shared config

Usage

load_shared_categories(config_path = NULL)

Arguments

config_path

Optional explicit path to config file

Value

Named list of category definitions

Load dataflow definitions from shared config

Description

Load dataflow definitions from shared config

Usage

load_shared_dataflows(config_path = NULL)

Arguments

config_path

Optional explicit path to config file

Value

Named list of dataflow definitions

Load indicator definitions from shared config

Description

Load indicator definitions from shared config

Usage

load_shared_indicators(config_path = NULL)

Arguments

config_path

Optional explicit path to config file

Value

Named list of indicator definitions

Load sync history

Description

Load sync history

Usage

load_sync_history()

Value

List with sync history (matches Python structure)

Load last sync summary (deprecated, use load_sync_history)

Description

Load last sync summary (deprecated, use load_sync_history)

Usage

load_sync_summary()

Value

List with sync summary from latest vintage

Load metadata from a specific vintage

Description

Load metadata from a specific vintage

Usage

load_vintage(vintage = NULL, cache_dir = NULL)

Arguments

vintage

Vintage date (YYYY-MM-DD) or NULL for current

cache_dir

Optional cache directory path

Value

List with dataflows, codelists, and indicators

Examples

## Not run: 
# Load current metadata
meta <- load_vintage()

# Load from specific vintage
meta <- load_vintage("2025-11-15")

## End(Not run)

Parse year parameter into start_year, end_year, and year_list

Description

Supports multiple formats for specifying years:

NULL: All years (no filtering)
Single integer: Just that year (e.g., 2020)
String with colon: Range (e.g., "2015:2023")
String with comma: List (e.g., "2015,2018,2020")
Integer vector: Explicit list of years

Usage

parse_year(year)

Arguments

year

Year specification in any supported format

Value

List with start_year, end_year, and year_list components

Examples

parse_year(2020)
# $start_year: 2020, $end_year: 2020, $year_list: NULL

parse_year("2015:2023")
# $start_year: 2015, $end_year: 2023, $year_list: NULL

parse_year("2015,2018,2020")
# $start_year: 2015, $end_year: 2020, $year_list: c(2015, 2018, 2020)

Print method for dataflow schema

Description

Print method for dataflow schema

Usage

## S3 method for class 'unicef_dataflow_schema'
print(x, ...)

Arguments

x

A unicef_dataflow_schema object

...

Additional arguments (ignored)

Value

Invisibly returns the input object x.

Execute a code block with labeled logging and error handling

Description

Execute a code block with labeled logging and error handling

Usage

process_block(label, expr)

Arguments

label

Character label for the block.

expr

Expression to evaluate.

Value

The result of the expression or NULL on error.

Refresh Indicator Cache

Description

Force refresh of the indicator cache from UNICEF SDMX API.

Usage

refresh_indicator_cache()

Value

Integer. Number of indicators in the refreshed cache

Examples


n <- refresh_indicator_cache()
message(sprintf("Refreshed cache with %d indicators", n))

Safely read a CSV file with error handling and logging

Description

Safely read a CSV file with error handling and logging

Usage

safe_read_csv(path, label = NULL, show_col_types = FALSE)

Arguments

path

Character path to the CSV file.

label

Optional label for logging (defaults to basename of path).

show_col_types

Logical; whether to show column types (default FALSE).

Value

A data frame (tibble) or NULL if an error occurs.

Safely read a CSV from a URL with error handling

Description

Safely read a CSV from a URL with error handling

Usage

safe_read_csv_url(url, name)

Arguments

url

Character URL to the CSV file.

name

Character name for logging purposes.

Value

A data frame (tibble) or NULL if an error occurs.

Safely save a data frame to CSV using base R

Description

Safely save a data frame to CSV using base R

Usage

safe_save_csv(df, path, label)

Arguments

df

Data frame to save.

path

Character path where the CSV should be saved.

label

Character label for logging purposes.

Value

None (invisible).

Safely write a data frame to CSV with error handling and logging

Description

Safely write a data frame to CSV with error handling and logging

Usage

safe_write_csv(df, path, label = NULL)

Arguments

df

Data frame to save.

path

Character path where the CSV should be saved.

label

Optional label for logging (defaults to basename of path).

Value

None (invisible).

Schema Caching System for UNICEF SDMX API

Description

Implements in-memory caching of SDMX metadata schemas to reduce API calls and improve performance during interactive analysis sessions.

Details

This module provides:

Session-level schema cache to avoid redundant API calls
Automatic expiry based on age
Programmatic cache invalidation
Cache statistics and monitoring

Examples

## Not run: 
  # Cache is managed automatically when get_sdmx() is called with cache=TRUE

  # Manual cache operations:
  get_schema_cache_info()
  clear_schema_cache()

  # Multiple calls within session use cached schema
  df1 <- get_sdmx(indicator = "SP.POP.TOTL", cache = TRUE)
  df2 <- get_sdmx(indicator = "NY.GDP.MKTP.CD", cache = TRUE)

## End(Not run)

Search Indicators

Description

Search and display UNICEF indicators in a user-friendly format. This function allows analysts to search the indicator metadata to find indicator codes they need. Results are printed to the screen in a formatted table.

Usage

search_indicators(
  query = NULL,
  category = NULL,
  limit = 50,
  show_description = TRUE
)

Arguments

query

Character. Search term to match in indicator code, name, or description (case-insensitive). If NULL, shows all indicators.

category

Character. Filter by dataflow/category (e.g., "CME", "NUTRITION"). Use list_categories() to see available categories.

limit

Integer. Maximum number of results to display (default: 50). Set to NULL or 0 to show all matches.

show_description

Logical. If TRUE, includes description column (default: TRUE).

Value

Invisibly returns a data.frame with the matching indicators. Results are also printed to the screen.

Examples


# Search for mortality-related indicators
search_indicators("mortality")

# List all nutrition indicators
search_indicators(category = "NUTRITION")

# Search for stunting across all categories
search_indicators("stunting")

# List all indicators (first 50)
search_indicators()

# List all CME indicators without limit
search_indicators(category = "CME", limit = 0)

Set metadata cache directory

Description

Set metadata cache directory

Usage

set_metadata_cache(path = NULL)

Arguments

path

Path to cache directory. If NULL, uses tempdir() for temporary caching. To create a persistent cache in your project, explicitly set a directory path.

Value

Invisibly returns the path to the cache directory.

Examples


# Use temporary cache (default, no files created in home directory)
set_metadata_cache()

# Use persistent cache in a project directory (explicit opt-in)
set_metadata_cache(tempdir())

Simplify columns to essentials

Description

Simplify columns to essentials

Usage

simplify_columns(df, format)

Sync all metadata from UNICEF SDMX API

Description

Downloads dataflows, codelists, countries, regions, indicator definitions, and optionally dataflow schemas, saving them as YAML files with standardized watermarks.

Usage

sync_all_metadata(
  verbose = TRUE,
  output_dir = NULL,
  include_schemas = TRUE,
  include_sample_values = TRUE
)

Arguments

verbose

Print progress messages

output_dir

Output directory (default: R/metadata/current/)

include_schemas

Sync dataflow schemas (default: TRUE). This generates dataflow_index.yaml and individual dataflow YAML files in dataflows/

include_sample_values

Include sample values in schemas (default: TRUE)

Value

List with sync summary

Examples

## Not run: 
# Sync all metadata including schemas
results <- sync_all_metadata()

# Sync without schemas (faster)
results <- sync_all_metadata(include_schemas = FALSE)

# Sync with custom output directory
results <- sync_all_metadata(output_dir = "./my_metadata/")

## End(Not run)

Sync codelist definitions from SDMX API (excluding countries/regions)

Description

Sync codelist definitions from SDMX API (excluding countries/regions)

Sync codelists (excluding countries/regions)

Usage

sync_codelists(codelist_ids = NULL, verbose = TRUE, output_dir = NULL)

sync_codelists(codelist_ids = NULL, verbose = TRUE, output_dir = NULL)

Arguments

codelist_ids

Vector of codelist IDs

verbose

Print progress

output_dir

Output directory

Value

List with codelist metadata

List of codelists

Sync country codes from CL_COUNTRY

Description

Sync country codes from CL_COUNTRY

Usage

sync_countries(verbose = TRUE, output_dir = NULL)

sync_countries(verbose = TRUE, output_dir = NULL)

Arguments

verbose

Print progress

output_dir

Output directory

Value

List with country codes

Named list of countries (code -> name)

Sync dataflow schemas from SDMX API to YAML file

Description

Sync dataflow schemas from SDMX API to YAML file

Usage

sync_dataflow_schemas(
  output_dir = NULL,
  verbose = TRUE,
  dataflows = NULL,
  include_sample_values = TRUE
)

Arguments

output_dir

Directory to save schemas (default: ../metadata/current)

verbose

Print progress messages

dataflows

Character vector of specific dataflow IDs to sync (default: all)

include_sample_values

Fetch sample data and include top 10 most frequent values per column

Value

List with sync results

Sync dataflow definitions from SDMX API

Description

Sync dataflow definitions from SDMX API

Sync dataflow definitions

Usage

sync_dataflows(verbose = TRUE, output_dir = NULL)

sync_dataflows(verbose = TRUE, output_dir = NULL)

Arguments

verbose

Print progress

output_dir

Output directory

Value

List with dataflow metadata

List of dataflows

Sync indicator mappings (indicator -> dataflow)

Description

Uses the shared common_indicators.yaml config file to ensure consistency across Python, R, and Stata platforms.

Usage

sync_indicators(dataflows = NULL, verbose = TRUE, output_dir = NULL)

Arguments

dataflows

List of dataflows (from sync_dataflows)

verbose

Print progress

output_dir

Output directory

Value

List with indicators and indicators_by_dataflow

Sync all metadata from UNICEF SDMX API

Description

Downloads dataflows, codelists, countries, regions, and indicator definitions, then saves them as YAML files in the cache directory with standardized watermarks.

Usage

sync_metadata(cache_dir = NULL, verbose = TRUE)

Arguments

cache_dir

Path to cache directory (default: ./metadata/)

verbose

Print progress messages (default: TRUE)

Value

List with sync summary including counts and timestamps

Examples

## Not run: 
sync_metadata()
sync_metadata(cache_dir = "./my_cache/")

## End(Not run)

Sync regional/aggregate codes from CL_WORLD_REGIONS

Description

Sync regional/aggregate codes from CL_WORLD_REGIONS

Sync regional codes from CL_WORLD_REGIONS

Usage

sync_regions(verbose = TRUE, output_dir = NULL)

sync_regions(verbose = TRUE, output_dir = NULL)

Arguments

verbose

Print progress

output_dir

Output directory

Value

List with regional codes

Named list of regions (code -> name)

Fetch UNICEF SDMX data or structure

Description

Download UNICEF indicator data from the SDMX data warehouse. Supports automatic paging, retrying on transient failure, memoisation, and tidy-up.

This function uses unified parameter names consistent with the Python package.

Usage

unicefData(
  indicator = NULL,
  dataflow = NULL,
  countries = NULL,
  year = NULL,
  sex = "_T",
  totals = FALSE,
  age = NULL,
  wealth = NULL,
  residence = NULL,
  maternal_edu = NULL,
  tidy = TRUE,
  include_label_columns = FALSE,
  country_names = TRUE,
  max_retries = 3,
  cache = FALSE,
  page_size = 1e+05,
  detail = c("data", "structure"),
  version = NULL,
  labels = "id",
  metadata = "light",
  format = c("long", "wide", "wide_indicators", "wide_attributes", "wide_sex",
    "wide_age", "wide_wealth", "wide_residence", "wide_maternal_edu"),
  pivot = NULL,
  latest = FALSE,
  circa = FALSE,
  add_metadata = NULL,
  dropna = FALSE,
  simplify = FALSE,
  mrv = NULL,
  raw = FALSE,
  ignore_duplicates = FALSE
)

unicefdata(
  indicator = NULL,
  dataflow = NULL,
  countries = NULL,
  year = NULL,
  sex = "_T",
  totals = FALSE,
  age = NULL,
  wealth = NULL,
  residence = NULL,
  maternal_edu = NULL,
  tidy = TRUE,
  include_label_columns = FALSE,
  country_names = TRUE,
  max_retries = 3,
  cache = FALSE,
  page_size = 1e+05,
  detail = c("data", "structure"),
  version = NULL,
  labels = "id",
  metadata = "light",
  format = c("long", "wide", "wide_indicators", "wide_attributes", "wide_sex",
    "wide_age", "wide_wealth", "wide_residence", "wide_maternal_edu"),
  pivot = NULL,
  latest = FALSE,
  circa = FALSE,
  add_metadata = NULL,
  dropna = FALSE,
  simplify = FALSE,
  mrv = NULL,
  raw = FALSE,
  ignore_duplicates = FALSE
)

Arguments

indicator

Character vector of indicator codes (e.g., "CME_MRY0T4").

dataflow

Character vector of dataflow IDs (e.g., "CME", "NUTRITION").

countries

Character vector of ISO3 country codes (e.g., c("ALB", "USA")). If NULL (default), fetches all countries.

year

Year specification. Supports multiple formats:

NULL: All available years (default)
Single integer: Just that year (e.g., 2020)
String with colon: Range (e.g., "2015:2023")
String with comma: Non-contiguous years (e.g., "2015,2018,2020")
Integer vector: Explicit list of years (e.g., c(2015, 2018, 2020))

sex

Sex disaggregation: "_T" (total, default), "F" (female), "M" (male).

totals

Logical; if FALSE (default), excludes observations with _T (total) codes in dimension values, matching Python/Stata behavior. Set to TRUE to include totals.

age

Filter by age group. Default is NULL (keeps totals).

wealth

Filter by wealth quintile. Default is NULL (keeps totals).

residence

Filter by residence (e.g. "URBAN", "RURAL"). Default is NULL (keeps totals).

maternal_edu

Filter by maternal education. Default is NULL (keeps totals).

tidy

Logical; if TRUE (default), returns cleaned tibble with standardized column names.

include_label_columns

Logical; if FALSE (default), drops human-readable label-expansion columns added by SDMX when labels=both; produces a codes-only schema consistent across R/Python/Stata.

country_names

Logical; if TRUE (default), adds country name column.

max_retries

Number of retry attempts on failure (default: 3). Previously called 'retry'. Both parameter names are supported.

cache

Logical; if TRUE, memoises results.

page_size

Integer rows per page (default: 100000).

detail

"data" (default) or "structure" for metadata.

version

Optional SDMX version; if NULL, auto-detected.

labels

Label format for SDMX requests: "id" (codes only, default), "name" (labels only), or "both" (codes and labels).

metadata

Metadata detail level: "light" (default) or "full".

format

Output format: "long" (default), "wide" (years as columns), "wide_indicators" (indicators as columns), or wide by dimension: "wide_sex", "wide_age", "wide_wealth", "wide_residence", "wide_maternal_edu".

pivot

Character vector of column(s) to pivot to wide format. Alternative to format parameter for custom pivoting.

latest

Logical; if TRUE, keep only the most recent non-missing value per country. The year may differ by country. Useful for cross-sectional analysis.

circa

Logical; if TRUE, for each specified year find the closest available data point. When exact years aren't available, returns observations with periods closest to the requested year(s). Different countries may have different actual years. Only applies when specific years are requested.

add_metadata

Character vector of metadata to add: "region", "income_group", "continent", "indicator_name", "indicator_category".

dropna

Logical; if TRUE, remove rows with missing values.

simplify

Logical; if TRUE, keep only essential columns.

mrv

Integer; keep only the N most recent values per country (Most Recent Values).

raw

Logical; if TRUE, return raw SDMX data without column standardization. Default is FALSE (clean, standardized output matching Python package).

ignore_duplicates

Logical; if FALSE (default), raises an error when exact duplicate rows are found (all column values identical). Set to TRUE to allow automatic removal of duplicates.

Value

Tibble with indicator data, or xml_document if detail="structure". The 'period' column contains decimal years (see Time Period Handling section).

Time Period Handling

The UNICEF SDMX API returns TIME_PERIOD values in various formats (annual "2020" or monthly "2020-03"). This function automatically converts monthly periods to decimal years for consistent time-series analysis:

"2020" becomes 2020.0 (integer year)
"2020-01" becomes 2020.0833 (2020 + 1/12, January)
"2020-06" becomes 2020.5000 (2020 + 6/12, June)
"2020-11" becomes 2020.9167 (2020 + 11/12, November)

Formula: decimal_year = year + month/12

Cross-Platform Consistency

By default, unicefData returns a codes-only schema that matches the Python and Stata implementations. Specifically:

SDMX requests use codes (labels=id) or client-side filtering removes human-readable label-expansion columns.
Output keeps standardized lowercase context columns (e.g., iso3, indicator, period, value) plus code columns for dimensions.
Indicator-specific dimension code columns are preserved (often lowercase).
Duplicate label columns are not included unless include_label_columns = TRUE is explicitly set.

This ensures column/row counts align across R, Python, and Stata by default.

Examples


# Fetch under-5 mortality for year range
df <- unicefData(
  indicator = "CME_MRY0T4",
  countries = c("ALB", "USA", "BRA"),
  year = "2015:2023"
)

# Single year
df <- unicefData(
  indicator = "CME_MRY0T4",
  countries = c("ALB", "USA"),
  year = 2020
)

# Non-contiguous years
df <- unicefData(
  indicator = "CME_MRY0T4",
  year = "2015,2018,2020"
)

# Circa mode - find closest available year
df <- unicefData(
  indicator = "CME_MRY0T4",
  year = 2015,
  circa = TRUE  # Returns closest to 2015 for each country
)

# Get latest value per country (cross-sectional)
df <- unicefData(
  indicator = "CME_MRY0T4",
  latest = TRUE
)

# Wide format with region metadata
df <- unicefData(
  indicator = "CME_MRY0T4",
  format = "wide",
  add_metadata = c("region", "income_group")
)

# Multiple indicators merged automatically
df <- unicefData(
  indicator = c("CME_MRY0T4", "NT_ANT_HAZ_NE2_MOD"),
  format = "wide_indicators",
  latest = TRUE
)

Fetch Raw UNICEF Data

Description

Low-level fetcher for UNICEF SDMX API.

Usage

unicefData_raw(
  indicator = NULL,
  dataflow = NULL,
  countries = NULL,
  start_year = NULL,
  end_year = NULL,
  max_retries = 3,
  version = NULL,
  page_size = 1e+05,
  verbose = TRUE,
  totals = FALSE,
  labels = "id"
)

unicefdata_raw(
  indicator = NULL,
  dataflow = NULL,
  countries = NULL,
  start_year = NULL,
  end_year = NULL,
  max_retries = 3,
  version = NULL,
  page_size = 1e+05,
  verbose = TRUE,
  totals = FALSE,
  labels = "id"
)

Arguments

indicator

Character vector of indicator codes.

dataflow

Character string of dataflow ID.

countries

Character vector of ISO3 codes.

start_year

Numeric or character start year (YYYY).

end_year

Numeric or character end year (YYYY).

max_retries

Integer, number of retries for failed requests.

version

Character string of SDMX version (e.g. "1.0").

page_size

Integer, number of rows per page.

verbose

Logical, print progress messages.

totals

Logical, include total aggregations.

labels

Character, label format ("id" or "name").

Value

A tibble of raw SDMX data, or an empty tibble if no data found.

Validate a data frame against cached metadata

Description

Checks:

Indicator code exists in catalog
Required columns are present
Country codes are valid
Values are within expected ranges

Usage

validate_data(df, indicator_code, strict = FALSE)

Arguments

df

Data frame to validate

indicator_code

Expected indicator code

strict

If TRUE, fail on any warning

Value

List with is_valid (logical) and issues (character vector)

Examples

## Not run: 
result <- validate_data(df, "CME_MRY0T4")
if (result$is_valid) {
  message("Data is valid!")
} else {
  message("Issues found:")
  print(result$issues)
}

## End(Not run)

Validate Data Against Schema

Description

Checks if the data matches the expected schema for the dataflow.

Usage

validate_unicef_schema(df, dataflow_id)

Arguments

df

Data frame to validate

dataflow_id

Dataflow ID

Value

Validated data frame (warnings issued if mismatch)

Package {unicefData}

Store Schema in Cache

Description

Usage

Arguments

Value

Create a vintage snapshot of current metadata

Description

Usage

Arguments

Value

Create watermarked metadata structure

Description

Usage

Arguments

Value

Create a list with standardized watermark header (matches Python format)

Description

Usage

Arguments

Ensure indicator cache is loaded

Description

Usage

Arguments

Value

Fetch codelist from SDMX API

Description

Usage

Arguments

Value

Fetch indicator codelist from UNICEF SDMX API

Description

Usage

Value

Fetch data from a single dataflow with 404 detection

Description

Usage

Arguments

Value

Fetch with retries

Description

Usage

Arguments

Value

Fetch XML content from URL with retry logic

Description

Usage

Arguments

Value

Find metadata directory

Description

Usage

Get basic dataflow info from _unicefdata_dataflows.yaml

Description

Usage

Get path to the indicator cache file

Description

Usage

Retrieve Schema from Cache

Description

Usage

Arguments

Value

Get fallback indicator definitions

Description

Usage

Value

Get fallback dataflow sequences (lazy loading)

Description

Usage

Value

Get path to fallback sequences file

Description

Usage

Get metadata directory path

Description

Usage

Value

Infer organizational CATEGORY from indicator code prefix

Description