API
SubspaceClustering.SubspaceClustering — Module
Subspace clustering module. Provides algorithms for clustering data points by subspace.
SubspaceClustering.KASResult — Type
KASResult{
TUb<:Union{AbstractFloat,Complex{<:AbstractFloat}},
TU<:AbstractVector{<:AbstractMatrix{TUb}},
Tb<:AbstractVector{<:AbstractVector{TUb}},
Tc<:AbstractVector{<:Integer},
T<:Real}The output of kas.
Fields
U::TU: vector of affine space basis matricesU[1],...,U[K]b::Tb: vector of bias vectorsb[1],...,b[K]c::Tc: vector of cluster assignmentsc[1],...,c[N]iterations::Int: number of iterations performedtotalcost::T: final value of total cost functioncounts::Vector{Int}: vector of cluster sizescounts[1],...,counts[K]converged::Bool: final convergence status
SubspaceClustering.KSSResult — Type
KSSResult{
TU<:AbstractVector{<:AbstractMatrix{<:Union{AbstractFloat,Complex{<:AbstractFloat}}}},
Tc<:AbstractVector{<:Integer},
T<:Real}The output of kss.
Fields
U::TU: vector of subspace basis matricesU[1],...,U[K]c::Tc: vector of cluster assignmentsc[1],...,c[N]iterations::Int: number of iterations performedtotalcost::T: final value of total cost functioncounts::Vector{Int}: vector of cluster sizescounts[1],...,counts[K]converged::Bool: final convergence status
SubspaceClustering.TSCResult — Type
" TSCResult{ TA<:AbstractMatrix{<:Real}, TE<:AbstractMatrix{<:Real}, TK<:KmeansResult, Tc<:AbstractVector{<:Integer}}
The output of tsc.
Fields
affinity::TA:N×NTSC affinity matrixembedding::TE:K×NTSC embedding matrixkmeans_runs::Vector{TK}: vector of outputs from batched K-meansassignments::Tc: vector of final assignments
SubspaceClustering.kas — Method
kas(X::AbstractMatrix{<:Number}, d::AbstractVector{<:Integer};
maxiters = 100,
rng = default_rng(),
init = [(randsubspace(rng, float(eltype(X)), size(X, 1), di), zeros(float(eltype(X)), size(X, 1))) for di in d],
showprogress = false)Cluster the N data points in the D×N data matrix X into K clusters via the K-affine-spaces (KAS) algorithm with corresponding affine space dimensions d[1],...,d[K]. Output is a KASResult containing the resulting cluster assignments c[1],...,c[N], affine space basis matrices U[1],...,U[K], bias vectors b[1],...,b[K], and metadata about the algorithm run.
KAS seeks to cluster data points by their affine space by minimizing the following total cost
\[\sum_{i=1}^N \| X[:, i] - (U[c[i]] U[c[i]]' (X[:, i] - b[c[i]]) + b[c[i]]) \|_2^2\]
with respect to the cluster assignments c[1],...,c[N], affine space basis matrices U[1],...,U[K], and bias vectors b[1],...,b[K].
Keyword arguments
maxiters::Integer = 100: maximum number of iterationsrng::AbstractRNG = default_rng(): random number generator (used when reinitializing the affine space for an empty cluster)init::AbstractVector{<:Tuple{<:AbstractMatrix{TUb},<:AbstractVector{TUb}}} = [(randsubspace(rng, float(eltype(X)), size(X, 1), di), zeros(float(eltype(X)), size(X, 1))) for di in d]: vector ofKinitial pair of affine space basis matrices containingU[1],...,U[K]and bias vectors containingb[1],...,b[K]whereTUbis a floating point type.showprogress::Bool = false: whether to log progress during the algorithm run
See also KASResult.
SubspaceClustering.kas_assign_clusters! — Method
kas_assign_clusters!(c, U, b, X)Assign the N data points in X to the K affine spaces in (U,b), update the vector of assignments c, and return this vector of assignments.
See also kas_assign_clusters, kas.
SubspaceClustering.kas_assign_clusters — Method
kas_assign_clusters(U, b, X)Assign the N data points in X to the K affine spaces in (U,b) and return a vector of the assignments.
See also kas_assign_clusters!, kas.
SubspaceClustering.kas_estimate_affinespace — Method
kas_estimate_affinespace(Xk, dk)Return dk-dimensional affine space that best fits the data points in Xk.
See also kas.
SubspaceClustering.kss — Method
kss(X::AbstractMatrix{<:Number}, d::AbstractVector{<:Integer};
maxiters = 100,
rng = default_rng(),
Uinit = [randsubspace(rng, size(X, 1), di) for di in d],
showprogress = false)Cluster the N data points in the D×N data matrix X into K clusters via the K-subspaces (KSS) algorithm with corresponding subspace dimensions d[1],...,d[K]. Output is a KSSResult containing the resulting cluster assignments c[1],...,c[N], subspace basis matrices U[1],...,U[K], and metadata about the algorithm run.
KSS seeks to cluster data points by their subspace by minimizing the following total cost
\[\sum_{i=1}^N \| X[:, i] - U[c[i]] U[c[i]]' X[:, i] \|_2^2\]
with respect to the cluster assignments c[1],...,c[N] and subspace basis matrices U[1],...,U[K].
Keyword arguments
maxiters::Integer = 100: maximum number of iterationsrng::AbstractRNG = default_rng(): random number generator (used when reinitializing the subspace for an empty cluster)Uinit::AbstractVector{<:AbstractMatrix{T}} = [randsubspace(rng, float(eltype(X)), size(X, 1), di) for di in d]: vector ofKinitial subspace basis matrices to use (eachUinit[k]should beD×d[k]and have eltypeTwhereTis a floating point type)showprogress::Bool = false: whether to log progress during the algorithm run
See also KSSResult.
SubspaceClustering.kss_assign_clusters! — Method
kss_assign_clusters!(c, U, X)Assign the N data points in X to the K subspaces in U, update the vector of assignments c, and return this vector of assignments.
See also kss_assign_clusters, kss.
SubspaceClustering.kss_assign_clusters — Method
kss_assign_clusters(U, X)Assign the N data points in X to the K subspaces in U and return a vector of the assignments.
See also kss_assign_clusters!, kss.
SubspaceClustering.kss_estimate_subspace — Method
kss_estimate_subspace(Xk, dk)Return dk-dimensional subspace that best fits the data points in Xk.
See also kss.
SubspaceClustering.randsubspace! — Method
randsubspace!([rng=default_rng()], U::AbstractMatrix{T})Set the D×d matrix U to be the basis matrix of a randomly generated d-dimensional subspace of ℝᴰ (if T<:Real) or ℂᴰ (if T<:Complex), where T must be a floating point type.
See also randsubspace
SubspaceClustering.randsubspace — Method
randsubspace([rng=default_rng()], [T=Float64], D, d)Generate a random d-dimensional subspace of ℝᴰ (if T<:Real) or ℂᴰ (if T<:Complex) and return a D×d orthonormal basis matrix with elements of type T (T must be a floating point type).
See also randsubspace!
SubspaceClustering.tsc — Method
tsc(X::AbstractMatrix{<:Real}, K::Integer;
max_nz = max(4, cld(size(X, 2), max(1, K))),
max_chunksize = 1000,
rng = default_rng(),
kmeans_nruns = 10,
kmeans_opts = (;),
showprogress = false)Cluster the N data points in the D×N data matrix X into K clusters via the Thresholding-based Subspace Clustering (TSC) algorithm with affinity matrix formed using at most max_nz neighbors. Output is a TSCResult containing the resulting cluster assignments with the internally computed affinity matrix, embedding matrix, and K-means runs.
TSC seeks to cluster data points by treating them as nodes of a weighted graph with weights given by a thresholded affinity matrix formed by thresholding the (transformed) absolute cosine similarities between every pair of points at max_nz neighbors then symmetrizing. Cluster assignments are then obtained via normalized spectral clustering of the graph.
Keyword arguments
max_nz::Integer = max(4, cld(size(X, 2), max(1, K))): maximum number of neighborsmax_chunksize::Integer = 1000: chunk size used intsc_affinityrng::AbstractRNG = default_rng(): random number generator used by K-meanskmeans_nruns::Integer = 10: number of K-means runs to performkmeans_opts = (;): additional options forkmeansshowprogress::Bool = false: whether to log progress during the algorithm run
See also TSCResult, tsc_affinity, tsc_embedding.
SubspaceClustering.tsc_affinity — Method
tsc_affinity(X; max_nz = max(2, cld(size(X, 2), 4)), max_chunksize = 1000,
showprogress = false)Compute the sparse TSC affinity (i.e., adjacency) matrix for the N data points in X formed by thresholding their pairwise absolute cosine similarities at max_nz neighbors then symmetrizing.
To handle datasets with a large number of points N, the computation is performed over chunks of at most max_chunksize points at a time.
See also tsc.
SubspaceClustering.tsc_embedding — Method
tsc_embedding(A, K)Compute the K-dimensional TSC embedding for the N×N affinity matrix A, returning a K×N matrix of embeddings.
SubspaceClustering.@logprogressif — Macro
@logprogressif cond [name] progress [key1=val1 [key2=val2 ...]]Conditional version of @logprogress that only logs progress if cond is true, in which case it passes the remaining arguments to @logprogress. Otherwise, it does nothing.
See also @withprogressif.
SubspaceClustering.@withprogressif — Macro
@withprogressif cond [name=""] [parentid=uuid4()] exConditional version of @withprogress that only sets up a progress bar if cond is true, in which case it passes the remaining arguments to @withprogress. Otherwise, it executes ex directly.
See also @logprogressif.