Canopy clustering algorithm

Template:Wikify is deprecated. Please use a more specific cleanup template as listed in the documentation.

The canopy clustering algorithm in computing is an unsupervised clustering algorithm related to the K-means algorithm.

It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical because of the size of the data set.

The algorithm proceeds as follows:

Cheaply partitioning the data into overlapping subsets, called 'canopies'
Perform more expensive clustering, but only within these canopies

Benefits

The number of instances of training data that must be compared at each step is reduced
There is some evidence that the resulting clusters are improved

References

This algorithms or data structures-related article is a stub. You can help Wikipedia by expanding it.