Thresholding-based subspace clustering (TSC)

Theory / Background

Todo

Write up mathematical background here

Syntax

The following function runs TSC:

SubspaceClustering.tscFunction
tsc(X::AbstractMatrix{<:Real}, K::Integer;
    max_nz = max(4, cld(size(X, 2), max(1, K))),
    max_chunksize = 1000,
    rng = default_rng(),
    kmeans_nruns = 10,
    kmeans_opts = (;))

Cluster the N data points in the D×N data matrix X into K clusters via the Thresholding-based Subspace Clustering (TSC) algorithm with affinity matrix formed using at most max_nz neighbors. Output is a TSCResult containing the resulting cluster assignments with the internally computed affinity matrix, embedding matrix, and K-means runs.

TSC seeks to cluster data points by treating them as nodes of a weighted graph with weights given by a thresholded affinity matrix formed by thresholding the (transformed) absolute cosine similarities between every pair of points at max_nz neighbors then symmetrizing. Cluster assignments are then obtained via normalized spectral clustering of the graph.

Keyword arguments

  • max_nz::Integer = max(4, cld(size(X, 2), max(1, K))): maximum number of neighbors
  • max_chunksize::Integer = 1000: chunk size used in tsc_affinity
  • rng::AbstractRNG = default_rng(): random number generator used by K-means
  • kmeans_nruns::Integer = 10: number of K-means runs to perform
  • kmeans_opts = (;): additional options for kmeans

See also TSCResult, tsc_affinity, tsc_embedding.

source

The output has the following type:

SubspaceClustering.TSCResultType

" TSCResult{ TA<:AbstractMatrix{<:Real}, TE<:AbstractMatrix{<:Real}, TK<:KmeansResult, Tc<:AbstractVector{<:Integer}}

The output of tsc.

Fields

  • affinity::TA : N×N TSC affinity matrix
  • embedding::TE : K×N TSC embedding matrix
  • kmeans_runs::Vector{TK} : vector of outputs from batched K-means
  • assignments::Tc : vector of final assignments
source

Examples

TSC with equal subspace dimensions

Todo

Write up example here

TSC with different subspace dimensions

Todo

Write up example here

TSC with reproducible random number generation

Todo

Write up example here