
Multi-Objective Clustering Guided by a-Priori Biological Knowledge.
moc.gapbk implements the MOC-GaPBK algorithm proposed by
Parraga-Alava and others (2018). The algorithm couples
NSGA-II with Path-Relinking (PR) and
Pareto Local Search (PLS) to produce a set of
non-dominated clustering solutions over two distance matrices: one
usually derived from the data, and a second one that encodes a-priori
biological (or domain) knowledge.
You can install the released version of moc.gapbk from
CRAN
with:
install.packages("moc.gapbk")Or the development version from GitHub with:
# install.packages("pak")
pak::pak("jorgeklz/package-moc.gapbk")A minimal end-to-end run on synthetic data:
library(moc.gapbk)
set.seed(2026)
# Toy data: 50 objects (e.g. genes) described by 20 features.
x <- matrix(stats::runif(50 * 20, min = -5, max = 10),
nrow = 50, ncol = 20)
# Two distance matrices over the same set of objects.
d1 <- as.matrix(stats::dist(x, method = "euclidean"))
d2 <- as.matrix(stats::dist(x, method = "manhattan"))
res <- moc.gapbk(dmatrix1 = d1,
dmatrix2 = d2,
num_k = 3,
generation = 5,
pop_size = 6)The result is a list with three components: population,
matrix.solutions and clustering.
names(res)
#> [1] "population" "matrix.solutions" "clustering"The first solution on the Pareto front, as a named integer vector ready to feed into a validation index or a plot:
head(res$clustering[[1]])
#> 1 2 3 4 5 6
#> 3 3 2 1 2 1
table(res$clustering[[1]])
#>
#> 1 2 3
#> 7 15 28See vignette("moc-gapbk-intro") for a longer
walkthrough, including how to enable intensification and diversification
stages with local_search = TRUE.
| Stage | Component | Role |
|---|---|---|
| 1 | NSGA-II | Multi-objective evolutionary engine |
| 2 | Path-Relinking | Intensification between Pareto solutions |
| 3 | Pareto Local Search | Diversification across the front |
Two versions of the Xie-Beni validity index are used
as objective functions, one per distance matrix. Typically
dmatrix1 is derived from the data itself and
dmatrix2 encodes prior knowledge (for example, GO semantic
similarity in gene-clustering applications).
If you use this package, please cite:
Parraga-Alava, J., Dorn, M., Inostroza-Ponta, M. (2018). A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies. BioData Mining 11(1), 1-16. https://doi.org/10.1186/s13040-018-0178-4
You can also run citation("moc.gapbk") inside R.
Please note that the moc.gapbk project is released with
a Contributor
Code of Conduct. By contributing to this project, you agree to abide
by its terms.
GPL-2.