Jump to content

Canopy clustering algorithm

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Endpoint (talk | contribs) at 20:45, 2 December 2007 (Added overview of algorithm). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The canopy clustering algorithm is an unsupervised clustering algorithm related to the K-Means Algorithm.

It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical because of the size of the data set.

The algorithm proceeds as follows:

  • Cheaply partition the data into overlapping subsets, called 'canopies'
  • Perform more expensive clustering, but only within these canopies. This reduces the number of instances of training data (and for K-Means the number of clusters) that must be compared at each step

See also