---
title: "Measurement System: Two-Factor CFA"
author: "Greg Veramendi"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Measurement System: Two-Factor CFA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

This vignette shows how to specify and estimate a two-factor measurement system
with **factorana**. The latent factors are identified from a set of noisy
indicators (measurements) whose loadings the package estimates jointly with
the factor variances.

The example below uses two latent factors ("cognitive" and "non-cognitive"
skills) each measured by three continuous indicators, and is small enough
(n = 300) to run quickly.

## Simulate data

```{r}
library(factorana)

set.seed(1)
n <- 300

# Latent factors (true values, unobserved in practice)
f_cog    <- rnorm(n, mean = 0, sd = 1.0)
f_noncog <- rnorm(n, mean = 0, sd = 0.8)

# Cognitive indicators: loadings (1.0, 0.9, 0.7); error sd = 0.5
cog1 <- 0.0 + 1.0 * f_cog + rnorm(n, 0, 0.5)
cog2 <- 0.2 + 0.9 * f_cog + rnorm(n, 0, 0.5)
cog3 <- 0.1 + 0.7 * f_cog + rnorm(n, 0, 0.5)

# Non-cognitive indicators: loadings (1.0, 1.1, 0.8); error sd = 0.5
nc1 <- 0.0 + 1.0 * f_noncog + rnorm(n, 0, 0.5)
nc2 <- 0.1 + 1.1 * f_noncog + rnorm(n, 0, 0.5)
nc3 <- 0.0 + 0.8 * f_noncog + rnorm(n, 0, 0.5)

dat <- data.frame(
  intercept = 1,
  cog1 = cog1, cog2 = cog2, cog3 = cog3,
  nc1  = nc1,  nc2  = nc2,  nc3  = nc3
)
head(dat)
```

## Define the factor model

Two independent latent factors. Loading normalizations are set on the component
side: each factor has one indicator with loading fixed at 1 (to pin the scale)
and two free loadings.

```{r}
fm <- define_factor_model(n_factors = 2, factor_structure = "independent")
```

## Define model components

For each indicator we declare a linear equation, which factor(s) it loads on,
and any fixed loadings.  `loading_normalization` takes a vector of length
`n_factors`:

- `1`  = loading fixed at 1 (scale normalization)
- `0`  = loading fixed at 0 (indicator is unrelated to that factor)
- `NA_real_` = free parameter, to be estimated

```{r}
# Cognitive indicators: load on factor 1 only
mc_cog1 <- define_model_component(
  name = "cog1", data = dat, outcome = "cog1", factor = fm,
  covariates = "intercept", model_type = "linear",
  loading_normalization = c(1, 0)        # factor 1 loading = 1, factor 2 loading = 0
)
mc_cog2 <- define_model_component(
  name = "cog2", data = dat, outcome = "cog2", factor = fm,
  covariates = "intercept", model_type = "linear",
  loading_normalization = c(NA_real_, 0) # factor 1 loading free, factor 2 loading = 0
)
mc_cog3 <- define_model_component(
  name = "cog3", data = dat, outcome = "cog3", factor = fm,
  covariates = "intercept", model_type = "linear",
  loading_normalization = c(NA_real_, 0)
)

# Non-cognitive indicators: load on factor 2 only
mc_nc1 <- define_model_component(
  name = "nc1", data = dat, outcome = "nc1", factor = fm,
  covariates = "intercept", model_type = "linear",
  loading_normalization = c(0, 1)
)
mc_nc2 <- define_model_component(
  name = "nc2", data = dat, outcome = "nc2", factor = fm,
  covariates = "intercept", model_type = "linear",
  loading_normalization = c(0, NA_real_)
)
mc_nc3 <- define_model_component(
  name = "nc3", data = dat, outcome = "nc3", factor = fm,
  covariates = "intercept", model_type = "linear",
  loading_normalization = c(0, NA_real_)
)
```

Assemble the components into a system:

```{r}
ms <- define_model_system(
  components = list(mc_cog1, mc_cog2, mc_cog3, mc_nc1, mc_nc2, mc_nc3),
  factor = fm
)
```

## Estimate

The estimator uses Gauss-Hermite quadrature to integrate over the latent
factors; we keep `n_quad_points` modest here for speed.

```{r}
ctrl <- define_estimation_control(n_quad_points = 6, num_cores = 1)

fit <- estimate_model_rcpp(
  model_system = ms,
  data         = dat,
  control      = ctrl,
  optimizer    = "nlminb",
  parallel     = FALSE,
  verbose      = FALSE
)

fit$convergence  # 0 indicates successful convergence
```

## Inspect estimates

```{r}
# Tidy table of parameter estimates with standard errors
components_table(fit, digits = 3)
```

Factor variances are near the true values (1.0 for the cognitive factor and
0.64 = 0.8^2 for the non-cognitive factor); loadings recover the simulated
values (cog2 ≈ 0.9, cog3 ≈ 0.7, nc2 ≈ 1.1, nc3 ≈ 0.8).

## Factor scores

Posterior mean factor scores for each observation can be recovered from the
estimated model:

```{r}
fscores <- estimate_factorscores_rcpp(
  fit, dat, control = ctrl, parallel = FALSE, verbose = FALSE
)
head(fscores[, c("obs_id", "factor_1", "factor_2",
                 "se_factor_1", "se_factor_2", "converged")])

# Correlation of estimated factor scores with the true (unobserved) factors
cor(fscores$factor_1, f_cog)
cor(fscores$factor_2, f_noncog)
```

## Where to go next

- `vignette("roy_model", package = "factorana")` — an applied example with a
  discrete sector-choice component and partially observed potential outcomes.
- `?define_model_component` — supported model types (linear, probit, logit,
  ordered probit) and options for factor interactions and type intercepts.
- `?estimate_model_rcpp` — full list of estimator options, including parallel
  estimation, checkpointing, and adaptive integration for two-stage workflows.