---
title: "Sequence Plots: heatmap, index, and distribution"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Sequence Plots: heatmap, index, and distribution}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>",
                      fig.width = 7, fig.height = 5.5,
                      out.width = "100%",
                      dev = "png", dpi = 72)
```

```{r setup, message = FALSE}
library(Nestimate)
```

## The dataset

`trajectories` ships with Nestimate: 138 student sequences × 15 weeks,
three states (`Active`, `Average`, `Disengaged`) plus `NA` for missed
weeks.

```{r data}
data(trajectories)
dim(trajectories)
head(trajectories[, 1:8])
sort(unique(as.vector(trajectories)), na.last = NA)
```

`sequence_plot()` is the single entry point for three views of this data:

| `type` | What it shows | Uses dendrogram? | Facets? |
|---|---|---|---|
| `"heatmap"` *(default)* | dense carpet, rows sorted by a distance/dendrogram | yes | no |
| `"index"` | carpet without dendrogram, row-gap optional | no | yes |
| `"distribution"` | stacked area / bar of state proportions over time | no | yes |

Defaults: `legend = "right"`, `frame = FALSE`.

## 1. `type = "heatmap"` — clustered carpet

### 1.1 Default — LCS distance, ward.D2 dendrogram

```{r h-default}
sequence_plot(trajectories)
```

### 1.2 Switch the sort strategy

```{r h-freq}
sequence_plot(trajectories, sort = "frequency",
              main = "sort = 'frequency'")
```

```{r h-hamming}
sequence_plot(trajectories, sort = "hamming",
              main = "sort = 'hamming'")
```

```{r h-start}
sequence_plot(trajectories, sort = "start",
              main = "sort = 'start' (no dendrogram)")
```

Available sorts: `lcs` (default), `frequency`, `start`, `end`, plus any
`build_clusters()` distance — `hamming`, `osa`, `lv`, `dl`, `qgram`,
`cosine`, `jaccard`, `jw`.

### 1.3 Cluster separators with `k`

Cut the dendrogram into `k` groups and overlay thin horizontal lines at
the cluster boundaries in the ordered rows. Tune with `k_color` and
`k_line_width`.

```{r h-k3}
sequence_plot(trajectories, k = 3,
              main = "k = 3 — white separators")
```

```{r h-k5-black}
sequence_plot(trajectories, k = 5,
              k_color = "black", k_line_width = 1.2,
              main = "k = 5 — thin black")
```

### 1.4 Legend position, custom palette, title

```{r h-legend}
sequence_plot(trajectories,
              legend = "bottom",
              legend_title = "Engagement",
              state_colors = c("#2a9d8f", "#e9c46a", "#e76f51"),
              main = "Custom palette + bottom legend")
```

### 1.5 Cell borders + `tick` thinning

```{r h-borders}
sequence_plot(trajectories,
              cell_border = "grey60", tick = 3,
              main = "Cell grid + every-3rd tick")
```

### 1.6 `frame = TRUE` brings back the outer box

```{r h-frame}
sequence_plot(trajectories, frame = TRUE,
              main = "frame = TRUE")
```

## 2. `type = "index"` — gap-ready carpet with facets

No dendrogram. Rows are sorted within each panel by `sort`. Supports
`group` (vector or auto from a `net_clustering`) plus `ncol` / `nrow`
facet grids.

### 2.1 Single panel

```{r i-default}
sequence_plot(trajectories, type = "index",
              main = "index — single panel")
```

### 2.2 Visible row gaps

```{r i-gap}
sequence_plot(trajectories, type = "index", row_gap = 0.25,
              main = "index with row_gap = 0.25")
```

### 2.3 Faceted by `net_clustering` (auto 2×2 for k = 3)

```{r i-cluster}
cl <- build_clusters(as.data.frame(trajectories), k = 3L,
                   dissimilarity = "hamming", method = "ward.D2")
sequence_plot(cl, type = "index",
              main = "index faceted by build_clusters(k = 3)")
```

### 2.4 Force a 1×3 row

```{r i-row, fig.width=9, fig.height=4}
sequence_plot(cl, type = "index", ncol = 3, nrow = 1,
              main = "index — ncol = 3, nrow = 1")
```

## 3. `type = "distribution"` — state proportions over time

Stacked area or bar chart of state frequencies per time column.

### 3.1 Default stacked area

```{r d-area}
sequence_plot(trajectories, type = "distribution",
              main = "distribution — stacked area")
```

### 3.2 Stacked bars, count scale

```{r d-bar-count}
sequence_plot(trajectories, type = "distribution",
              geom = "bar", scale = "count",
              main = "distribution — bars, count scale")
```

### 3.3 NA band on/off

```{r d-na-true}
sequence_plot(trajectories, type = "distribution", na = TRUE,
              main = "na = TRUE")
```

```{r d-na-false}
sequence_plot(trajectories, type = "distribution", na = FALSE,
              main = "na = FALSE")
```

### 3.4 Faceted by cluster

```{r d-cluster}
sequence_plot(cl, type = "distribution",
              main = "distribution by cluster (k = 3)")
```

## Cheat sheet

```{r cheatsheet, eval = FALSE}
# Always explore first with the default:
sequence_plot(trajectories)

# Zoom in on cluster structure:
sequence_plot(trajectories, k = 3)
sequence_plot(trajectories, sort = "hamming", k = 4)

# Compare cluster compositions:
cl <- build_clusters(as.data.frame(trajectories), k = 3,
                   dissimilarity = "hamming", method = "ward.D2")
sequence_plot(cl, type = "index")
sequence_plot(cl, type = "distribution")

# Polish for a paper:
sequence_plot(trajectories, k = 3,
              state_colors = c("#2a9d8f", "#e9c46a", "#e76f51"),
              legend_title = "Engagement",
              legend = "bottom",
              cell_border = "grey70",
              main = "Student engagement trajectories")
```