Thresholding-based subspace clustering (TSC)
Theory / Background
Syntax
The following function runs TSC:
SubspaceClustering.tsc — Function
tsc(X::AbstractMatrix{<:Real}, K::Integer;
max_nz = max(4, cld(size(X, 2), max(1, K))),
max_chunksize = 1000,
rng = default_rng(),
kmeans_nruns = 10,
kmeans_opts = (;))Cluster the N data points in the D×N data matrix X into K clusters via the Thresholding-based Subspace Clustering (TSC) algorithm with affinity matrix formed using at most max_nz neighbors. Output is a TSCResult containing the resulting cluster assignments with the internally computed affinity matrix, embedding matrix, and K-means runs.
TSC seeks to cluster data points by treating them as nodes of a weighted graph with weights given by a thresholded affinity matrix formed by thresholding the (transformed) absolute cosine similarities between every pair of points at max_nz neighbors then symmetrizing. Cluster assignments are then obtained via normalized spectral clustering of the graph.
Keyword arguments
max_nz::Integer = max(4, cld(size(X, 2), max(1, K))): maximum number of neighborsmax_chunksize::Integer = 1000: chunk size used intsc_affinityrng::AbstractRNG = default_rng(): random number generator used by K-meanskmeans_nruns::Integer = 10: number of K-means runs to performkmeans_opts = (;): additional options forkmeans
See also TSCResult, tsc_affinity, tsc_embedding.
The output has the following type:
SubspaceClustering.TSCResult — Type
" TSCResult{ TA<:AbstractMatrix{<:Real}, TE<:AbstractMatrix{<:Real}, TK<:KmeansResult, Tc<:AbstractVector{<:Integer}}
The output of tsc.
Fields
affinity::TA:N×NTSC affinity matrixembedding::TE:K×NTSC embedding matrixkmeans_runs::Vector{TK}: vector of outputs from batched K-meansassignments::Tc: vector of final assignments