Constrained clustering: Difference between revisions

Content deleted Content added

Inline

Revision as of 05:52, 14 March 2022

In computer science, constrained clustering is a class of semi-supervised learning algorithms. Typically, constrained clustering incorporates either a set of must-link constraints, cannot-link constraints, or both, with a Data clustering algorithm. Both a must-link and a cannot-link constraint define a relationship between two data instances. A must-link constraint is used to specify that the two instances in the must-link relation should be associated with the same cluster. A cannot-link constraint is used to specify that the two instances in the cannot-link relation should not be associated with the same cluster. These sets of constraints acts as a guide for which a constrained clustering algorithm will attempt to find clusters in a data set which satisfy the specified must-link and cannot-link constraints. Some constrained clustering algorithms will abort if no such clustering exists which satisfies the specified constraints. Others will try to minimize the amount of constraint violation should it be impossible to find a clustering which satisfies the constraints. Constraints could also be used to guide the selection of a clustering model among several possible solutions.^[1]

A cluster in which the members conform to all must-link and cannot-link constraints is called a chunklet.

Examples

Examples of constrained clustering algorithms include:

COP K-means ^[2]
PCKmeans (Pairwise Constrained K-means) ^[3]
CMWK-Means (Constrained Minkowski Weighted K-Means) ^[4]

References

^ Pourrajabi, M.; Moulavi, D.; Campello, R. J. G. B.; Zimek, A.; Sander, J.; Goebel, R. (2014). "Model Selection for Semi-Supervised Clustering". Proceedings of the 17th International Conference on Extending Database Technology (EDBT). pp. 331–342. doi:10.5441/002/edbt.2014.31.
^ Wagstaff, K.; Cardie, C.; Rogers, S.; Schrödl, S. (2001). "Constrained K-means Clustering with Background Knowledge". Proceedings of the Eighteenth International Conference on Machine Learning. pp. 577–584.
^ http://www.cs.utexas.edu/~ml/papers/semi-sdm-04.pdf ^{[bare URL PDF]}
^ de Amorim, R. C. (2012). "Constrained Clustering with Minkowski Weighted K-Means". Proceedings of the 13th IEEE International Symposium on Computational Intelligence and Informatics. pp. 13–17. doi:10.1109/CINTI.2012.6496753.

This computer science article is a stub. You can help Wikipedia by expanding it.

[pourrajabi-1] Pourrajabi, M.; Moulavi, D.; Campello, R. J. G. B.; Zimek, A.; Sander, J.; Goebel, R. (2014). "Model Selection for Semi-Supervised Clustering". Proceedings of the 17th International Conference on Extending Database Technology (EDBT). pp. 331–342. doi:10.5441/002/edbt.2014.31.

[2] Wagstaff, K.; Cardie, C.; Rogers, S.; Schrödl, S. (2001). "Constrained K-means Clustering with Background Knowledge". Proceedings of the Eighteenth International Conference on Machine Learning. pp. 577–584.

[3] ttp://www.cs.utexas.edu/~ml/papers/semi-sdm-04.pdf ^{[bare URL PDF]}

[4] Amorim, R. C. (2012). "Constrained Clustering with Minkowski Weighted K-Means". Proceedings of the 13th IEEE International Symposium on Computational Intelligence and Informatics. pp. 13–17. doi:10.1109/CINTI.2012.6496753.

[1]

[2]

[3]

[4]

@@ Line 1: / Line 1: @@
-In [[computer science]], '''constrained clustering''' is a class of [[semi-supervised learning]] algorithms. Typically, constrained clustering incorporates either a set of must-link constraints, cannot-link constraints, or both, with a [[Data clustering]] algorithm. Both a must-link and a cannot-link constraint define a relationship between two data instances. A must-link constraint is used to specify that the two instances in the must-link relation should be associated with the same cluster. A cannot-link constraint is used to specify that the two instances in the cannot-link relation should ''not'' be associated with the same cluster. These sets of constraints acts as a guide for which a constrained clustering algorithm will attempt to find clusters in a data set which satisfy the specified must-link and cannot-link constraints. Some constrained clustering algorithms will abort if no such clustering exists which satisfies the specified constraints. Others will try to minimize the amount of constraint violation should it be impossible to find a clustering which satisfies the constraints. Constraints could also be used to guide the selection of a clustering model among several possible solutions. <ref name="pourrajabi">{{Cite conference
+In [[computer science]], '''constrained clustering''' is a class of [[semi-supervised learning]] algorithms. Typically, constrained clustering incorporates either a set of must-link constraints, cannot-link constraints, or both, with a [[Data clustering]] algorithm. Both a must-link and a cannot-link constraint define a relationship between two data instances. A must-link constraint is used to specify that the two instances in the must-link relation should be associated with the same cluster. A cannot-link constraint is used to specify that the two instances in the cannot-link relation should ''not'' be associated with the same cluster. These sets of constraints acts as a guide for which a constrained clustering algorithm will attempt to find clusters in a data set which satisfy the specified must-link and cannot-link constraints. Some constrained clustering algorithms will abort if no such clustering exists which satisfies the specified constraints. Others will try to minimize the amount of constraint violation should it be impossible to find a clustering which satisfies the constraints. Constraints could also be used to guide the selection of a clustering model among several possible solutions.<ref name="pourrajabi">{{Cite conference
 | first1 = M. | last1 = Pourrajabi
 | first2 = D. | last2 = Moulavi
@@ Line 27: / Line 27: @@
  | pages = 577&ndash;584
 }}</ref>
-* PCKmeans (Pairwise Constrained K-means) <ref>http://www.cs.utexas.edu/~ml/papers/semi-sdm-04.pdf</ref>
+* PCKmeans (Pairwise Constrained K-means) <ref>http://www.cs.utexas.edu/~ml/papers/semi-sdm-04.pdf {{Bare URL PDF|date=March 2022}}</ref>
 * CMWK-Means (Constrained Minkowski Weighted K-Means) <ref>{{Cite conference
  | first1 = R. C. |last1=de Amorim
@@ Line 41: / Line 41: @@
 {{Reflist|1}}
+[[Category:Cluster analysis algorithms]]
+[[Category:Cluster analysis]]
 {{comp-sci-stub}}
-[[Category:Cluster analysis algorithms]]
-[[Category:Cluster analysis]]