Many methods in machine learning, spatial analysis, and network science share a common first step: how similar is each data point to its neighbors? The answer is an adjacency matrix — an n×n sparse matrix where entry (i, j) holds the similarity between points i and j, and off-neighbor entries are zero.
adjoin constructs these matrices from three kinds of
inputs:
The output is always a neighbor_graph, a lightweight
object that exposes a consistent interface for extracting the adjacency
matrix, computing the graph Laplacian, and inspecting edges.
The easiest entry point is graph_weights(). Give it a
data matrix, a neighborhood size k, and a weight mode, and
it returns a ready-to-use neighbor_graph.
set.seed(42)
X <- as.matrix(iris[, 1:4]) # 150 flowers × 4 measurements
ng <- graph_weights(X, k = 5,
neighbor_mode = "knn",
weight_mode = "heat")
A <- adjacency(ng) # sparse 150×150 similarity matrix
cat("nodes:", nvertices(ng), " | edges:", nnzero(A) / 2, "\n")
#> nodes: 150 | edges: 508.5graph_weights() searched for the 5 nearest Euclidean
neighbors of each flower, converted distances to similarities with a
heat kernel (exp(−d²/2σ²)), then symmetrized the result. The adjacency
matrix is stored as a sparse Matrix — most of its 22,500
entries are zero.
Adjacency matrix reordered by species. The three diagonal blocks show that flowers connect mostly within their own species.
The three diagonal blocks confirm that the graph is recovering species structure from raw petal and sepal measurements — no labels required.
Every feature-similarity graph follows the same three-step pipeline.
Step 1 — Prepare data. Rows are observations; columns are features.
Step 2 — Build the graph. Pass the matrix to
graph_weights().
Step 3 — Use the graph. Extract what downstream methods need.
A <- adjacency(ng) # sparse similarity matrix
L <- laplacian(ng) # L = D − A
L_norm <- laplacian(ng, normalized = TRUE) # I − D^{-1/2} A D^{-1/2}The Laplacian is the backbone of spectral clustering, diffusion maps, and graph-regularized learning. Both the unnormalized and symmetric-normalized forms are available directly.
Most users should start with graph_weights(). It takes a
data matrix, finds k-nearest neighbors with exact Euclidean search,
converts distances to similarities, applies the requested symmetry rule,
and returns a neighbor_graph with the construction
parameters stored in $params.
Use the lower-level constructors when you need a different return type or more control over the search step:
| Function | Return value | Use it when |
|---|---|---|
graph_weights() |
neighbor_graph |
You want the standard feature-similarity graph API:
adjacency(), laplacian(),
edges(), and stored parameters. |
weighted_knn() |
igraph or sparse Matrix |
You want a raw weighted kNN graph/matrix and will manage metadata yourself. |
graph_weights_fast() |
sparse Matrix |
You want the faster matrix-only path, self-tuned
weights, or explicit backend control. Exact Rnanoflann
search is the default; approximate HNSW is used only with
backend = "hnsw". |
neighbor_graph() |
neighbor_graph wrapper |
You already have an igraph, adjacency
matrix, or nnsearcher result and want to wrap it in the
package’s graph object. |
In short, graph_weights() is the high-level constructor,
weighted_knn() and graph_weights_fast() are
matrix/graph builders, and neighbor_graph() is the object
wrapper used after a graph already exists.
A neighbor_graph is a named list:
names(ng)
#> [1] "G" "params"
str(ng$params, max.level = 1)
#> List of 6
#> $ k : num 5
#> $ neighbor_mode: chr "knn"
#> $ weight_mode : chr "heat"
#> $ sigma : num 0.5
#> $ type : chr "normal"
#> $ labels : NULL$G — an igraph object
holding the full topology$params — the construction parameters,
kept for reproducibilityThree accessors cover the most common needs:
nvertices(ng) # number of nodes
#> [1] 150
head(edges(ng)) # edge list as a character matrix
#> [,1] [,2]
#> [1,] "1" "5"
#> [2,] "1" "8"
#> [3,] "1" "18"
#> [4,] "1" "28"
#> [5,] "1" "29"
#> [6,] "1" "38"The weight_mode argument maps Euclidean distances to
similarity scores.
| Mode | Similarity | Best for |
|---|---|---|
"heat" |
exp(−d²/2σ²) | General purpose; default |
"binary" |
1 for every neighbor | Unweighted structural analysis |
"cosine" |
dot product after L2 norm | Direction matters, not magnitude |
"normalized" |
correlation-like | Zero-mean, unit-variance features |
ng_norm <- graph_weights(X, k = 5, weight_mode = "normalized")
ng_cos <- graph_weights(X, k = 5, weight_mode = "cosine")
ng_heat <- graph_weights(X, k = 5, weight_mode = "heat", sigma = 0.5)![Edge weight distributions for three weight modes. All three spread similarity scores continuously across (0, 1].](/private/var/folders/9h/nkjq6vss7mqdl4ck7q1hd8ph0000gp/T/Rtmp9sg5Jq/Rbuild17c45310cbec/adjoin/vignettes/adjoin_files/figure-html/weight-mode-plot-1.png)
Soft weights preserve the degree of similarity, not just its
presence — nearby neighbors score near 1, distant ones near 0. The
"binary" mode (not shown) produces unweighted graphs where
every neighbor gets weight 1.
By default, if either point nominated the other as a neighbor the
edge is included (type = "normal", union). Two alternatives
trade off sparsity and confidence:
# mutual: both must nominate each other (sparser, higher confidence)
ng_mutual <- graph_weights(X, k = 5, weight_mode = "heat",
type = "mutual")
# asym: directed — i→j does not imply j→i
ng_asym <- graph_weights(X, k = 5, weight_mode = "heat",
type = "asym")
# edge counts: normal ≥ mutual
c(normal = nnzero(adjacency(ng)) / 2,
mutual = nnzero(adjacency(ng_mutual)) / 2)
#> normal mutual
#> 508.5 241.5Mutual graphs are useful when false positives are costly — for example, selecting a reliable training neighborhood for a classifier.
For repeated queries, build an nnsearcher index once and
query it as many times as needed.
Find neighbors for every point:
nn_result <- find_nn(searcher, k = 5)
names(nn_result) # indices, distances, labels
#> [1] "labels" "indices" "distances"Search within a subset (e.g., setosa flowers only):
Search between two groups (setosa vs. versicolor):
Build a neighbor_graph from the index:
The two-step approach is useful when you need to inspect raw distances before committing to a kernel, or when you want to reuse the index across multiple graph configurations.
When class labels are available, class_graph() connects
every pair of same-class points — creating a fully connected block
structure.
class_graph adjacency for iris. Each block is a fully connected class; no edges cross species boundaries.
class_graph objects are neighbor_graph
subclasses — every accessor (adjacency(),
laplacian(), edges()) works on them too. They
also support class-specific queries:
# within-class neighbors for every point
wc <- within_class_neighbors(cg, X, k = 3)
# nearest between-class neighbors for every point
bc <- between_class_neighbors(cg, X, k = 3)This is the building block for supervised dimensionality reduction methods like LDA and neighborhood component analysis.
To connect two different datasets, use
cross_adjacency():
X_ref <- as.matrix(iris[1:100, 1:4]) # 100 reference points
X_query <- as.matrix(iris[101:150, 1:4]) # 50 query points
# 50×100 sparse matrix: row i = query flower, col j = nearest reference
C <- cross_adjacency(X_ref, X_query, k = 3, as = "sparse")
dim(C)
#> [1] 50 100Each row holds the heat-kernel similarities from one query point to its 3 nearest matches in the reference set. This is the foundation for cross-modal retrieval, domain adaptation, and inter-subject alignment.
Raw adjacency matrices are degree-imbalanced: high-degree nodes
dominate. normalize_adjacency() applies the symmetric
degree normalization D^{-1/2} A D^{-1/2} used by the
package’s default spectral diffusion routines. It preserves symmetry, so
the row sums are not expected to equal one:
A_raw <- adjacency(ng)
A_norm <- normalize_adjacency(A_raw)
range(Matrix::rowSums(A_norm))
#> [1] 0.5762124 1.3103125
Matrix::isSymmetric(A_norm)
#> [1] TRUEFor a Markov transition matrix used in random-walk analysis, row-normalize by degree instead:
deg <- Matrix::rowSums(A_raw)
P_walk <- Matrix::Diagonal(x = ifelse(deg > 0, 1 / deg, 0)) %*% A_raw
range(Matrix::rowSums(P_walk)[deg > 0])
#> [1] 1 1Use the symmetric normalization for spectral diffusion embeddings.
Use a row-stochastic transition matrix when the downstream analysis is
explicitly a random walk;
compute_diffusion_kernel(..., symmetric = FALSE) constructs
that transition internally.
| Object | Created by | What it holds |
|---|---|---|
neighbor_graph |
graph_weights(),
neighbor_graph() |
igraph + construction params |
sparse adjacency Matrix |
weighted_knn(..., as = "sparse"),
graph_weights_fast(), adjacency() |
Numeric edge weights for downstream matrix methods |
igraph |
weighted_knn(..., as = "igraph") |
Raw graph topology and edge weights |
class_graph |
class_graph() |
Extends neighbor_graph with class
structure |
nnsearcher |
nnsearcher() |
Reusable nearest-neighbor search index |
nn_search |
find_nn(), find_nn_among(),
find_nn_between() |
Raw search result: indices, distances, labels |
spatial_adjacency()
builds graphs from 2-D/3-D grid coordinates rather than feature vectors;
see ?spatial_adjacency.compute_diffusion_kernel()
and compute_diffusion_map() propagate information through a
graph and yield spectral embeddings.label_matrix() and
expand_label_similarity() create soft label-overlap
matrices for semi-supervised settings.estimate_sigma()
automatically picks a heat kernel bandwidth from your data
distribution.