BFR algorithm: Difference between revisions

Content deleted Content added

Inline

Revision as of 16:48, 18 May 2018

The BFR algorithm, named after its inventors Bradley, Fayyad and Reina, is a variant of k-means algorithm that is designed to cluster data in a high-dimensional Euclidean space. It makes a very strong assumption about the shape of clusters: they must be normally distributed about a centroid. The mean and standard deviation for a cluster may differ for different dimensions, but the dimensions must be independent.^[1]

^ Rajaraman, Anand; Ullman, Jeffrey; Leskovec, Jure (2011). Mining of Massive Datasets. New York, NY, USA: Cambridge University Press. pp. 257–258. ISBN 1107015359.

[1] Rajaraman, Anand; Ullman, Jeffrey; Leskovec, Jure (2011). Mining of Massive Datasets. New York, NY, USA: Cambridge University Press. pp. 257–258. ISBN 1107015359.

[1]

Revision as of 16:47, 18 May 2018 edit Nbro (talk \| contribs) Extended confirmed users 3,319 edits Citation of the book from which the text of this article was taken added Tag: Visual edit ← Previous edit		Revision as of 16:48, 18 May 2018 edit undo Nbro (talk \| contribs) Extended confirmed users 3,319 edits No edit summary Next edit →
Line 1:		Line 1:
	The '''BFR algorithm''', named after its inventors Bradley, Fayyad and Reina, is a variant of [[k-means clustering\|k-means algorithm]] that is designed to cluster data in a high-dimensional [[Euclidean space]]. It makes a very strong assumption about the shape of clusters: they must be [[Normal distribution\|normally distributed]] about a centroid. The mean and standard deviation for a cluster may differ for different dimensions, but the dimensions must be independent.<ref>{{Cite book\|title=Mining of Massive Datasets\|last=Rajaraman\|first=Anand\|last2=Ullman\|first2=Jeffrey\|last3=Leskovec\|first3=Jure\|publisher=Cambridge University Press\|year=2011\|isbn=1107015359\|location=New York, NY, USA\|pages=257-258}}</ref>		The '''BFR algorithm''', named after its inventors Bradley, Fayyad and Reina, is a variant of [[k-means clustering\|k-means algorithm]] that is designed to cluster data in a high-dimensional [[Euclidean space]]. It makes a very strong assumption about the shape of clusters: they must be [[Normal distribution\|normally distributed]] about a [[Centroid\|centroid]]. The [[mean]] and [[standard deviation]] for a cluster may differ for different dimensions, but the dimensions must be independent.<ref>{{Cite book\|title=Mining of Massive Datasets\|last=Rajaraman\|first=Anand\|last2=Ullman\|first2=Jeffrey\|last3=Leskovec\|first3=Jure\|publisher=Cambridge University Press\|year=2011\|isbn=1107015359\|location=New York, NY, USA\|pages=257-258}}</ref>