API

SubspaceClustering.KASResultType
KASResult{
    TUb<:Union{AbstractFloat,Complex{<:AbstractFloat}},
    TU<:AbstractVector{<:AbstractMatrix{TUb}},
    Tb<:AbstractVector{<:AbstractVector{TUb}},
    Tc<:AbstractVector{<:Integer},
    T<:Real}

The output of kas.

Fields

  • U::TU: vector of affine space basis matrices U[1],...,U[K]
  • b::Tb: vector of bias vectors b[1],...,b[K]
  • c::Tc: vector of cluster assignments c[1],...,c[N]
  • iterations::Int: number of iterations performed
  • totalcost::T: final value of total cost function
  • counts::Vector{Int}: vector of cluster sizes counts[1],...,counts[K]
  • converged::Bool: final convergence status
source
SubspaceClustering.KSSResultType
KSSResult{
    TU<:AbstractVector{<:AbstractMatrix{<:Union{AbstractFloat,Complex{<:AbstractFloat}}}},
    Tc<:AbstractVector{<:Integer},
    T<:Real}

The output of kss.

Fields

  • U::TU: vector of subspace basis matrices U[1],...,U[K]
  • c::Tc: vector of cluster assignments c[1],...,c[N]
  • iterations::Int: number of iterations performed
  • totalcost::T: final value of total cost function
  • counts::Vector{Int}: vector of cluster sizes counts[1],...,counts[K]
  • converged::Bool: final convergence status
source
SubspaceClustering.TSCResultType

" TSCResult{ TA<:AbstractMatrix{<:Real}, TE<:AbstractMatrix{<:Real}, TK<:KmeansResult, Tc<:AbstractVector{<:Integer}}

The output of tsc.

Fields

  • affinity::TA : N×N TSC affinity matrix
  • embedding::TE : K×N TSC embedding matrix
  • kmeans_runs::Vector{TK} : vector of outputs from batched K-means
  • assignments::Tc : vector of final assignments
source
SubspaceClustering.kasMethod
kas(X::AbstractMatrix{<:Number}, d::AbstractVector{<:Integer};
    maxiters = 100,
    rng = default_rng(),
    init = [(randsubspace(rng, float(eltype(X)), size(X, 1), di), zeros(float(eltype(X)), size(X, 1))) for di in d],
    showprogress = false)

Cluster the N data points in the D×N data matrix X into K clusters via the K-affine-spaces (KAS) algorithm with corresponding affine space dimensions d[1],...,d[K]. Output is a KASResult containing the resulting cluster assignments c[1],...,c[N], affine space basis matrices U[1],...,U[K], bias vectors b[1],...,b[K], and metadata about the algorithm run.

KAS seeks to cluster data points by their affine space by minimizing the following total cost

\[\sum_{i=1}^N \| X[:, i] - (U[c[i]] U[c[i]]' (X[:, i] - b[c[i]]) + b[c[i]]) \|_2^2\]

with respect to the cluster assignments c[1],...,c[N], affine space basis matrices U[1],...,U[K], and bias vectors b[1],...,b[K].

Keyword arguments

  • maxiters::Integer = 100: maximum number of iterations
  • rng::AbstractRNG = default_rng(): random number generator (used when reinitializing the affine space for an empty cluster)
  • init::AbstractVector{<:Tuple{<:AbstractMatrix{TUb},<:AbstractVector{TUb}}} = [(randsubspace(rng, float(eltype(X)), size(X, 1), di), zeros(float(eltype(X)), size(X, 1))) for di in d]: vector of K initial pair of affine space basis matrices containing U[1],...,U[K] and bias vectors containing b[1],...,b[K] where TUb is a floating point type.
  • showprogress::Bool = false: whether to log progress during the algorithm run

See also KASResult.

source
SubspaceClustering.kssMethod
kss(X::AbstractMatrix{<:Number}, d::AbstractVector{<:Integer};
    maxiters = 100,
    rng = default_rng(),
    Uinit = [randsubspace(rng, size(X, 1), di) for di in d],
    showprogress = false)

Cluster the N data points in the D×N data matrix X into K clusters via the K-subspaces (KSS) algorithm with corresponding subspace dimensions d[1],...,d[K]. Output is a KSSResult containing the resulting cluster assignments c[1],...,c[N], subspace basis matrices U[1],...,U[K], and metadata about the algorithm run.

KSS seeks to cluster data points by their subspace by minimizing the following total cost

\[\sum_{i=1}^N \| X[:, i] - U[c[i]] U[c[i]]' X[:, i] \|_2^2\]

with respect to the cluster assignments c[1],...,c[N] and subspace basis matrices U[1],...,U[K].

Keyword arguments

  • maxiters::Integer = 100: maximum number of iterations
  • rng::AbstractRNG = default_rng(): random number generator (used when reinitializing the subspace for an empty cluster)
  • Uinit::AbstractVector{<:AbstractMatrix{T}} = [randsubspace(rng, float(eltype(X)), size(X, 1), di) for di in d]: vector of K initial subspace basis matrices to use (each Uinit[k] should be D×d[k] and have eltype T where T is a floating point type)
  • showprogress::Bool = false: whether to log progress during the algorithm run

See also KSSResult.

source
SubspaceClustering.randsubspace!Method
randsubspace!([rng=default_rng()], U::AbstractMatrix{T})

Set the D×d matrix U to be the basis matrix of a randomly generated d-dimensional subspace of ℝᴰ (if T<:Real) or ℂᴰ (if T<:Complex), where T must be a floating point type.

See also randsubspace

source
SubspaceClustering.randsubspaceMethod
randsubspace([rng=default_rng()], [T=Float64], D, d)

Generate a random d-dimensional subspace of ℝᴰ (if T<:Real) or ℂᴰ (if T<:Complex) and return a D×d orthonormal basis matrix with elements of type T (T must be a floating point type).

See also randsubspace!

source
SubspaceClustering.tscMethod
tsc(X::AbstractMatrix{<:Real}, K::Integer;
    max_nz = max(4, cld(size(X, 2), max(1, K))),
    max_chunksize = 1000,
    rng = default_rng(),
    kmeans_nruns = 10,
    kmeans_opts = (;),
    showprogress = false)

Cluster the N data points in the D×N data matrix X into K clusters via the Thresholding-based Subspace Clustering (TSC) algorithm with affinity matrix formed using at most max_nz neighbors. Output is a TSCResult containing the resulting cluster assignments with the internally computed affinity matrix, embedding matrix, and K-means runs.

TSC seeks to cluster data points by treating them as nodes of a weighted graph with weights given by a thresholded affinity matrix formed by thresholding the (transformed) absolute cosine similarities between every pair of points at max_nz neighbors then symmetrizing. Cluster assignments are then obtained via normalized spectral clustering of the graph.

Keyword arguments

  • max_nz::Integer = max(4, cld(size(X, 2), max(1, K))): maximum number of neighbors
  • max_chunksize::Integer = 1000: chunk size used in tsc_affinity
  • rng::AbstractRNG = default_rng(): random number generator used by K-means
  • kmeans_nruns::Integer = 10: number of K-means runs to perform
  • kmeans_opts = (;): additional options for kmeans
  • showprogress::Bool = false: whether to log progress during the algorithm run

See also TSCResult, tsc_affinity, tsc_embedding.

source
SubspaceClustering.tsc_affinityMethod
tsc_affinity(X; max_nz = max(2, cld(size(X, 2), 4)), max_chunksize = 1000,
                showprogress = false)

Compute the sparse TSC affinity (i.e., adjacency) matrix for the N data points in X formed by thresholding their pairwise absolute cosine similarities at max_nz neighbors then symmetrizing.

To handle datasets with a large number of points N, the computation is performed over chunks of at most max_chunksize points at a time.

See also tsc.

source
SubspaceClustering.@logprogressifMacro
@logprogressif cond [name] progress [key1=val1 [key2=val2 ...]]

Conditional version of @logprogress that only logs progress if cond is true, in which case it passes the remaining arguments to @logprogress. Otherwise, it does nothing.

See also @withprogressif.

source
SubspaceClustering.@withprogressifMacro
@withprogressif cond [name=""] [parentid=uuid4()] ex

Conditional version of @withprogress that only sets up a progress bar if cond is true, in which case it passes the remaining arguments to @withprogress. Otherwise, it executes ex directly.

See also @logprogressif.

source