API
SubspaceClustering.SubspaceClustering — Module
Subspace clustering module. Provides algorithms for clustering data points by subspace.
SubspaceClustering.KSSResult — Type
KSSResult{
TU<:AbstractVector{<:AbstractMatrix{<:AbstractFloat}},
Tc<:AbstractVector{<:Integer},
T<:Real}The output of kss.
Fields
U::TU: vector of subspace basis matricesU[1],...,U[K]c::Tc: vector of cluster assignmentsc[1],...,c[N]iterations::Int: number of iterations performedtotalcost::T: final value of total cost functioncounts::Vector{Int}: vector of cluster sizescounts[1],...,counts[K]converged::Bool: final convergence status
SubspaceClustering.TSCResult — Type
" TSCResult{ TA<:AbstractMatrix{<:Real}, TE<:AbstractMatrix{<:Real}, TK<:KmeansResult, Tc<:AbstractVector{<:Integer}}
The output of tsc.
Fields
affinity::TA:N×NTSC affinity matrixembedding::TE:K×NTSC embedding matrixkmeans_runs::Vector{TK}: vector of outputs from batched K-meansassignments::Tc: vector of final assignments
SubspaceClustering.kss — Method
kss(X::AbstractMatrix{<:Real}, d::AbstractVector{<:Integer};
maxiters = 100,
rng = default_rng(),
Uinit = [randsubspace(rng, size(X, 1), di) for di in d])Cluster the N data points in the D×N data matrix X into K clusters via the K-subspaces (KSS) algorithm with corresponding subspace dimensions d[1],...,d[K]. Output is a KSSResult containing the resulting cluster assignments c[1],...,c[N], subspace basis matrices U[1],...,U[K], and metadata about the algorithm run.
KSS seeks to cluster data points by their subspace by minimizing the following total cost
\[\sum_{i=1}^N \| X[:, i] - U[c[i]] U[c[i]]' X[:, i] \|_2^2\]
with respect to the cluster assignments c[1],...,c[N] and subspace basis matrices U[1],...,U[K].
Keyword arguments
maxiters::Integer = 100: maximum number of iterationsrng::AbstractRNG = default_rng(): random number generator (used when reinitializing the subspace for an empty cluster)Uinit::AbstractVector{<:AbstractMatrix{<:AbstractFloat}} = [randsubspace(rng, size(X, 1), di) for di in d]: vector ofKinitial subspace basis matrices to use (eachUinit[k]should beD×d[k])
See also KSSResult.
SubspaceClustering.kss_assign_clusters! — Method
kss_assign_clusters!(c, U, X)Assign the N data points in X to the K subspaces in U, update the vector of assignments c, and return this vector of assignments.
See also kss_assign_clusters, kss.
SubspaceClustering.kss_assign_clusters — Method
kss_assign_clusters(U, X)Assign the N data points in X to the K subspaces in U and return a vector of the assignments.
See also kss_assign_clusters!, kss.
SubspaceClustering.kss_estimate_subspace — Method
kss_estimate_subspace(Xk, dk)Return dk-dimensional subspace that best fits the data points in Xk.
See also kss.
SubspaceClustering.randsubspace! — Method
randsubspace!([rng=default_rng()], U::AbstractMatrix)Set the D×d matrix U to be the basis matrix of a randomly generated d-dimensional subspace of ℝᴰ.
See also randsubspace
SubspaceClustering.randsubspace — Method
randsubspace([rng=default_rng()], [T=Float64], D, d)Generate a random d-dimensional subspace of ℝᴰ and return a basis matrix with element type T<:AbstractFloat.
See also randsubspace!
SubspaceClustering.tsc — Method
tsc(X::AbstractMatrix{<:Real}, K::Integer;
max_nz = max(4, cld(size(X, 2), max(1, K))),
max_chunksize = 1000,
rng = default_rng(),
kmeans_nruns = 10,
kmeans_opts = (;))Cluster the N data points in the D×N data matrix X into K clusters via the Thresholding-based Subspace Clustering (TSC) algorithm with affinity matrix formed using at most max_nz neighbors. Output is a TSCResult containing the resulting cluster assignments with the internally computed affinity matrix, embedding matrix, and K-means runs.
TSC seeks to cluster data points by treating them as nodes of a weighted graph with weights given by a thresholded affinity matrix formed by thresholding the (transformed) absolute cosine similarities between every pair of points at max_nz neighbors then symmetrizing. Cluster assignments are then obtained via normalized spectral clustering of the graph.
Keyword arguments
max_nz::Integer = max(4, cld(size(X, 2), max(1, K))): maximum number of neighborsmax_chunksize::Integer = 1000: chunk size used intsc_affinityrng::AbstractRNG = default_rng(): random number generator used by K-meanskmeans_nruns::Integer = 10: number of K-means runs to performkmeans_opts = (;): additional options forkmeans
See also TSCResult, tsc_affinity, tsc_embedding.
SubspaceClustering.tsc_affinity — Method
tsc_affinity(X; max_nz = max(2, cld(size(X, 2), 4)), max_chunksize = 1000)Compute the sparse TSC affinity (i.e., adjacency) matrix for the N data points in X formed by thresholding their pairwise absolute cosine similarities at max_nz neighbors then symmetrizing.
To handle datasets with a large number of points N, the computation is performed over chunks of at most max_chunksize points at a time.
See also tsc.
SubspaceClustering.tsc_embedding — Method
tsc_embedding(A, K)Compute the K-dimensional TSC embedding for the N×N affinity matrix A, returning a K×N matrix of embeddings.