---
title: "tlsR Workflow: From Raw Imaging Data to TLS Characterisation"
author: "Ali Amiryousefi"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{tlsR Workflow: From Raw Imaging Data to TLS Characterisation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 7,
  fig.height = 6,
  eval = TRUE
)
```

## Introduction

Tertiary lymphoid structures (TLS) are ectopic lymphoid organs that form in
non-lymphoid tissues -- most notably in tumors -- and are associated with
improved patient outcomes and immunotherapy response.  **tlsR** provides a
fast, reproducible pipeline for detecting TLS and characterizing their spatial
organisation in multiplexed tissue imaging data (e.g. mIHC, CODEX, IMC).

The core pipeline is:

```
Raw ldata list
     |
     v
detect_TLS()        <- KNN-based B+T co-localisation
     |
     +--> scan_clustering()   <- Sliding-window Ripley's L clustering map
     |
     +--> calc_icat()         <- ICAT spatial-spread score per TLS
     |
     +--> detect_tic()        <- T-cell clusters outside TLS
     |
     +--> summarize_TLS()     <- Tidy summary table
     |
     +--> plot_TLS()          <- Publication-ready spatial plot
```

---

## Data Format

`tlsR` expects a **named list of data frames** (`ldata`), one element per
tissue sample.  Each data frame must contain at minimum:

| Column      | Type      | Description                                      |
|-------------|-----------|--------------------------------------------------|
| `x`         | numeric   | X coordinate in microns                          |
| `y`         | numeric   | Y coordinate in microns                          |
| `phenotype` | character | Cell label; must contain `"B cell"` / `"T cell"` |

Additional columns (e.g. cell area, marker intensities) are silently ignored.

```{r load-data}
library(tlsR)

data(toy_ldata)

# Structure of the built-in example dataset
str(toy_ldata)
table(toy_ldata[["ToySample"]]$phenotype)
```

---

## Step 1 -- Detect TLS with `detect_TLS()`

`detect_TLS()` identifies B-cell-rich regions with sufficient T-cell
co-localisation using a KNN density approach.

```{r detect-tls}
data(toy_ldata)

ldata <- detect_TLS(
  LSP                     = "ToySample",
  k                       = 10,     # neighbours for density estimation
  bcell_density_threshold = 17,     # min avg 1/k-distance (um)
  min_B_cells             = 100,    # min B cells per candidate TLS
  min_T_cells_nearby      = 5,      # min T cells within max_distance_T
  max_distance_T          = 50,     # search radius (um)
  expand_distance         = 100,    # expanding radius
  ldata                   = toy_ldata
)

table(ldata[["ToySample"]]$tls_id_knn)
```

The new column `tls_id_knn` is `0` for non-TLS cells and a positive integer
for cells assigned to TLS 1, 2, 3, ... .

### Quick base-R check plot

```{r base-plot, fig.alt="Scatter plot of ToySample cells coloured by TLS membership"}
df <- ldata[["ToySample"]]

plot(df$x[df$tls_id_knn == 0],
     df$y[df$tls_id_knn == 0],
     col  = "grey80", pch = 19, cex = 0.3,
     xlab = "x (um)", ylab = "y (um)",
     main = "Detected TLS -- ToySample")

points(df$x[df$tls_id_knn > 0],
       df$y[df$tls_id_knn > 0],
       col = "#0072B2", pch = 19, cex = 0.4)

legend("bottomright",
       legend = c("Background", "TLS"),
       col    = c("grey80", "#0072B2"),
       pch    = 19, pt.cex = 1.2, bty = "n")
```

---

## Step 2 -- Local Ripley's L Map with `scan_clustering()`

`scan_clustering()` slides a square window across the tissue and computes the
**K-integral** clustering index in each window -- the mean positive excess of
the observed Ripley's L over the theoretical CSR value.

When `plot = TRUE` (the default) a spatial map is produced showing:

- All cells as small light-grey points.
- Phenotype cells coloured green (T cells) or red (B cells).
- A navy dashed grid marking window boundaries.
- A LOESS-smoothed L-excess curve overlaid inside each qualifying window.
- A bold numeric clustering-intensity (CI) label centred in each window.
- A legend identifying all point and curve colours.

### Single-phenotype map

```{r scan-B, eval = FALSE}
# eval=FALSE because this can take ~10--30 s on real data
L_B <- scan_clustering(
  ws             = 1000,        # window side (um)
  sample         = "ToySample",
  phenotype      = "B cells",
  plot           = TRUE,
  creep          = 1L,
  min_cells      = 10L,
  min_phen_cells = 5L,
  label_cex      = 1.1,        # increase if CI labels look small
  ldata          = ldata
)

cat("B-cell windows analysed:", length(L_B$B), "\n")
```

```{r scan-T, eval = FALSE}
L_T <- scan_clustering(
  ws        = 500,
  sample    = "ToySample",
  phenotype = "T cells",
  plot      = TRUE,
  ldata     = ldata
)

cat("T-cell windows analysed:", length(L_T$T), "\n")
```

### Side-by-side B and T cell panels

When `phenotype = "Both"` two panels are drawn side by side -- one for B cells
and one for T cells -- with a shared super-title, making it easy to compare
clustering intensity across compartments.

```{r scan-both, eval = FALSE}
L_both <- scan_clustering(
  ws        = 3000,
  sample    = "ToySample",
  phenotype = "Both",
  plot      = TRUE,
  ldata     = ldata
)

cat("B windows:", length(L_both$B), " | T windows:", length(L_both$T), "\n")
```

The returned list has named elements `$B` and `$T`, each containing `Lest`
objects for the qualifying windows of that phenotype.  Individual L curves can
be inspected or plotted directly from these objects.

---

## Step 3 -- ICAT Score with `calc_icat()`

The **ICAT (Immune Cell Arrangement Trace)** index quantifies the spatial
spread and linear organisation of cells within a TLS.  A higher value
indicates a more spatially extended, structured cluster.

### How it works

`calc_icat()` applies FastICA to the centred (x, y) coordinates of TLS cells,
reconstructs the data as
\(
  \hat{X} = S A^T + \mu
\),
and computes the normalised trace-standard-deviation:
\[
  \text{ICAT} = 100 \times
    \frac{\sqrt{v_1 + v_2 + 2\sqrt{v_1 v_2}}}{\text{nrow}(X)}
\]
where \(v_1, v_2\) are the marginal variances of \(\hat{X}\).  This
formulation is **always non-negative** -- it reflects average spatial spread per
cell in microns, rather than the signed trace of the raw mixing matrix which
can be negative due to ICA sign ambiguity.

```{r icat}
n_tls <- max(ldata[["ToySample"]]$tls_id_knn, na.rm = TRUE)

if (n_tls >= 1L) {
  icat_scores <- vapply(
    seq_len(n_tls),
    function(id) calc_icat("ToySample", tlsID = id, ldata = ldata),
    numeric(1L)
  )
  names(icat_scores) <- paste0("TLS", seq_len(n_tls))
  print(icat_scores)
}
```

`calc_icat()` returns `NA` (with a message) if a TLS has too few cells or if
FastICA fails to converge -- no errors are thrown.

---

## Step 4 -- Detect T-cell Clusters with `detect_tic()`

T-cell clusters (TIC) that lie *outside* TLS are identified with HDBSCAN.
The `min_pts` and `min_cluster_size` arguments let you control sensitivity.

```{r detect-tic}
ldata <- detect_tic(
  sample           = "ToySample",
  min_pts          = 20,    # HDBSCAN minPts
  min_cluster_size = 100,   # drop clusters smaller than this
  ldata            = ldata
)

table(
  ldata[["ToySample"]]$tcell_cluster_hdbscan[
    ldata[["ToySample"]]$tcell_cluster_hdbscan != 0
  ],
  useNA = "ifany"
)
```

---

## Step 5 -- Summary Table with `summarize_TLS()`

`summarize_TLS()` produces a tidy one-row-per-sample summary -- convenient for
downstream statistical analysis.

```{r summary}
sumtbl <- summarize_TLS(ldata, calc_icat_scores = FALSE)
print(sumtbl)
```

With `calc_icat_scores = TRUE` a list-column `icat_scores` is appended
containing named numeric vectors of per-TLS ICAT values (always non-negative).

---

## Step 6 -- Visualise with `plot_TLS()`

`plot_TLS()` produces a ggplot2 scatter plot with TLS and TIC coloured
distinctly using a colourblind-friendly palette.

### Rendering improvements

Two aesthetics have been tuned for clarity:

- **Background cells** are drawn with `bg_alpha = 0.25` (more transparent than
  before), so the foreground TLS and TIC structure is immediately visible.
- **TIC cells** are drawn at `point_size * tic_size_mult` (default multiplier
  `1.8x`), making them slightly larger than TLS cells without dominating the
  plot.

Both parameters are fully exposed as function arguments so you can fine-tune
them for your data density.

```{r plot-tls, fig.alt="ggplot2 spatial map of ToySample with TLS and TIC highlighted"}
p <- plot_TLS(
  sample        = "ToySample",
  ldata         = ldata,
  show_tic      = TRUE,
  point_size    = 0.5,
  alpha         = 0.7,     # TLS / TIC cells
  bg_alpha      = 0.25,    # background cells (more transparent)
  tic_size_mult = 0.8      # TIC cells drawn 1.8x larger
)
```

The returned `ggplot` object can be further customised with standard ggplot2
functions:

```{r plot-custom, fig.alt="Customised TLS plot with additional title"}
library(ggplot2)
p + labs(title = "ToySample -- Your custom title")
```

---

## Multi-Sample Workflow

`tlsR` is designed to scale naturally to many samples.  Simply pass your
full `ldata` list and iterate:

```{r multi-sample, eval = FALSE}
samples <- names(ldata)

ldata <- Reduce(function(ld, s) detect_TLS(s, ldata = ld), samples, ldata)
ldata <- Reduce(function(ld, s) detect_tic(s,  ldata = ld), samples, ldata)

summary_all <- summarize_TLS(ldata)
print(summary_all)
```

For `scan_clustering()` across many samples:

```{r multi-scan, eval = FALSE}
# Generate one spatial map per sample (side-by-side B and T panels)
for (s in names(ldata)) {
  scan_clustering(
    ws        = 500,
    sample    = s,
    phenotype = "Both",    # two-panel plot: B cells | T cells
    plot      = TRUE,
    label_cex = 1.2,       # slightly larger CI labels for presentation
    ldata     = ldata
  )
}
```

---

## Session Info

```{r session}
sessionInfo()
```