API

SubspaceClustering.KSSResultType
KSSResult{
    TU<:AbstractVector{<:AbstractMatrix{<:AbstractFloat}},
    Tc<:AbstractVector{<:Integer},
    T<:Real}

The output of kss.

Fields

  • U::TU: vector of subspace basis matrices U[1],...,U[K]
  • c::Tc: vector of cluster assignments c[1],...,c[N]
  • iterations::Int: number of iterations performed
  • totalcost::T: final value of total cost function
  • counts::Vector{Int}: vector of cluster sizes counts[1],...,counts[K]
  • converged::Bool: final convergence status
source
SubspaceClustering.TSCResultType

" TSCResult{ TA<:AbstractMatrix{<:Real}, TE<:AbstractMatrix{<:Real}, TK<:KmeansResult, Tc<:AbstractVector{<:Integer}}

The output of tsc.

Fields

  • affinity::TA : N×N TSC affinity matrix
  • embedding::TE : K×N TSC embedding matrix
  • kmeans_runs::Vector{TK} : vector of outputs from batched K-means
  • assignments::Tc : vector of final assignments
source
SubspaceClustering.kssMethod
kss(X::AbstractMatrix{<:Real}, d::AbstractVector{<:Integer};
    maxiters = 100,
    rng = default_rng(),
    Uinit = [randsubspace(rng, size(X, 1), di) for di in d])

Cluster the N data points in the D×N data matrix X into K clusters via the K-subspaces (KSS) algorithm with corresponding subspace dimensions d[1],...,d[K]. Output is a KSSResult containing the resulting cluster assignments c[1],...,c[N], subspace basis matrices U[1],...,U[K], and metadata about the algorithm run.

KSS seeks to cluster data points by their subspace by minimizing the following total cost

\[\sum_{i=1}^N \| X[:, i] - U[c[i]] U[c[i]]' X[:, i] \|_2^2\]

with respect to the cluster assignments c[1],...,c[N] and subspace basis matrices U[1],...,U[K].

Keyword arguments

  • maxiters::Integer = 100: maximum number of iterations
  • rng::AbstractRNG = default_rng(): random number generator (used when reinitializing the subspace for an empty cluster)
  • Uinit::AbstractVector{<:AbstractMatrix{<:AbstractFloat}} = [randsubspace(rng, size(X, 1), di) for di in d]: vector of K initial subspace basis matrices to use (each Uinit[k] should be D×d[k])

See also KSSResult.

source
SubspaceClustering.tscMethod
tsc(X::AbstractMatrix{<:Real}, K::Integer;
    max_nz = max(4, cld(size(X, 2), max(1, K))),
    max_chunksize = 1000,
    rng = default_rng(),
    kmeans_nruns = 10,
    kmeans_opts = (;))

Cluster the N data points in the D×N data matrix X into K clusters via the Thresholding-based Subspace Clustering (TSC) algorithm with affinity matrix formed using at most max_nz neighbors. Output is a TSCResult containing the resulting cluster assignments with the internally computed affinity matrix, embedding matrix, and K-means runs.

TSC seeks to cluster data points by treating them as nodes of a weighted graph with weights given by a thresholded affinity matrix formed by thresholding the (transformed) absolute cosine similarities between every pair of points at max_nz neighbors then symmetrizing. Cluster assignments are then obtained via normalized spectral clustering of the graph.

Keyword arguments

  • max_nz::Integer = max(4, cld(size(X, 2), max(1, K))): maximum number of neighbors
  • max_chunksize::Integer = 1000: chunk size used in tsc_affinity
  • rng::AbstractRNG = default_rng(): random number generator used by K-means
  • kmeans_nruns::Integer = 10: number of K-means runs to perform
  • kmeans_opts = (;): additional options for kmeans

See also TSCResult, tsc_affinity, tsc_embedding.

source
SubspaceClustering.tsc_affinityMethod
tsc_affinity(X; max_nz = max(2, cld(size(X, 2), 4)), max_chunksize = 1000)

Compute the sparse TSC affinity (i.e., adjacency) matrix for the N data points in X formed by thresholding their pairwise absolute cosine similarities at max_nz neighbors then symmetrizing.

To handle datasets with a large number of points N, the computation is performed over chunks of at most max_chunksize points at a time.

See also tsc.

source