https://en.wikipedia.org/w/index.php?action=history&feed=atom&title=Automatic_clustering_algorithms Automatic clustering algorithms - Revision history 2025-05-25T21:46:40Z Revision history for this page on the wiki MediaWiki 1.45.0-wmf.2 https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1291311071&oldid=prev 140.105.167.53 at 12:02, 20 May 2025 2025-05-20T12:02:08Z <p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 12:02, 20 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 27:</td> <td colspan="2" class="diff-lineno">Line 27:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the automation of data density to identify clusters, research has also been focused on artificially generating the algorithms. For instance, the Estimation of Distribution Algorithms guarantees the generation of valid algorithms by the [[directed acyclic graph]] (DAG), in which nodes represent procedures (building block) and edges represent possible execution sequences between two nodes. Building Blocks determine the EDA's alphabet or, in other words, any generated algorithm. Clustering algorithms artificially generated are compared to DBSCAN, a manual algorithm, in experimental results.&lt;ref&gt;{{Cite book |date=June 2012 |pages=1–7 |language=en-US|doi=10.1109/CEC.2012.6252874|citeseerx=10.1.1.308.9977 |chapter=AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms |title=2012 IEEE Congress on Evolutionary Computation |last1=Meiguins |first1=Aruanda S. G. |last2=Limao |first2=Roberto C. |last3=Meiguins |first3=Bianchi S. |last4=Junior |first4=Samuel F. S. |last5=Freitas |first5=Alex A. |isbn=978-1-4673-1509-8 }}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the automation of data density to identify clusters, research has also been focused on artificially generating the algorithms. For instance, the Estimation of Distribution Algorithms guarantees the generation of valid algorithms by the [[directed acyclic graph]] (DAG), in which nodes represent procedures (building block) and edges represent possible execution sequences between two nodes. Building Blocks determine the EDA's alphabet or, in other words, any generated algorithm. Clustering algorithms artificially generated are compared to DBSCAN, a manual algorithm, in experimental results.&lt;ref&gt;{{Cite book |date=June 2012 |pages=1–7 |language=en-US|doi=10.1109/CEC.2012.6252874|citeseerx=10.1.1.308.9977 |chapter=AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms |title=2012 IEEE Congress on Evolutionary Computation |last1=Meiguins |first1=Aruanda S. G. |last2=Limao |first2=Roberto C. |last3=Meiguins |first3=Bianchi S. |last4=Junior |first4=Samuel F. S. |last5=Freitas |first5=Alex A. |isbn=978-1-4673-1509-8 }}&lt;/ref&gt;</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>== AutoML for Clustering ==</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Recent advancements in automated machine learning (AutoML) have extended to the domain of clustering, where systems are designed to automatically select preprocessing techniques, feature transformations, clustering algorithms, and validation strategies without human intervention. Unlike traditional clustering methods that rely on fixed pipelines and manual tuning, AutoML-based clustering frameworks dynamically search for the best-performing configurations based on internal clustering validation indices (CVIs) or other unsupervised metrics.</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>An implementation in this area is TPOT-Clustering&lt;ref&gt;https://github.com/Mcamilo/tpot-clustering/tree/main&lt;/ref&gt;, an extension of the Tree-based Pipeline Optimization Tool (TPOT), which automates the process of building clustering pipelines using genetic programming. TPOT-Clustering explores combinations of data transformations, dimensionality reduction methods, clustering algorithms (e.g., K-means, DBSCAN, Agglomerative Clustering), and scoring functions to optimize clustering performance. It leverages an evolutionary algorithm to search the space of possible pipelines, using internal scores such as silhouette or Davies–Bouldin index to guide the selection process.</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>AutoML for clustering is particularly useful in domains where the structure of the data is unknown and manual tuning is infeasible due to the high dimensionality or complexity of the feature space. These approaches are gaining popularity in areas such as image segmentation, customer segmentation, and bioinformatics, where unsupervised insights are critical.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td> </tr> </table> 140.105.167.53 https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1290370185&oldid=prev MrOllie: Reverted 4 edits by Aasimayaz (talk): Rv lengthy unsourced addition 2025-05-14T11:41:07Z <p>Reverted 4 edits by <a href="/wiki/Special:Contributions/Aasimayaz" title="Special:Contributions/Aasimayaz">Aasimayaz</a> (<a href="/wiki/User_talk:Aasimayaz" title="User talk:Aasimayaz">talk</a>): Rv lengthy unsourced addition</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 11:41, 14 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Data processing algorithm}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Data processing algorithm}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Automatic clustering algorithms''' are algorithms that can perform clustering without prior knowledge of data sets. In contrast with other [[cluster analysis]] techniques, automatic clustering algorithms can determine the optimal number of clusters even in the presence of noise and outlier points.&lt;ref&gt;[[Outlier]]&lt;/ref&gt;{{context needed|date=September 2021}}</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>== Background ==</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Clustering, a core technique in data mining and machine learning, is an unsupervised learning method that groups similar data points into clusters based on defined similarity measures, such as Euclidean distance or cosine similarity. Unlike supervised learning, which relies on labeled data to train models, clustering operates without predefined class labels, seeking to uncover natural patterns or structures within the data. A key challenge in clustering is determining the optimal number of clusters (often denoted as ''k''), as this value significantly influences the quality and interpretability of the results. Traditional algorithms, such as k-means, require users to specify ''k'' in advance, which can be problematic in real-world applications like market segmentation, image analysis, or bioinformatics, where the true number of clusters is unknown or data complexity obscures clear groupings.</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Automatic clustering algorithms address this challenge by autonomously estimating the number of clusters during the clustering process, eliminating the need for manual specification of ''k''. These algorithms employ techniques such as statistical criteria (e.g., Bayesian Information Criterion or Akaike Information Criterion), density-based approaches, or hierarchical splitting/merging to identify an optimal number of clusters, even in datasets with noise or outliers. By adapting to the data’s inherent structure, automatic clustering algorithms enhance the robustness and flexibility of unsupervised learning. Their ability to function without prior knowledge makes them invaluable for exploratory data analysis, large-scale data processing, and applications where human intervention is impractical.</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The evolution of automatic clustering algorithms marks a significant milestone in unsupervised learning, enabling more efficient and scalable analysis of complex datasets. These methods empower data-driven discovery by automating a critical aspect of the clustering process, making them essential tools in fields ranging from scientific research to commercial data analytics.</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>== Types of Automatic Clustering Algorithms ==</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Automatic clustering algorithms are grouped by their approach to determining cluster numbers and grouping data. Density-based methods identify clusters as high-density regions separated by low-density areas, estimating cluster counts without predefined input. For example, DBSCAN groups points within a radius (ε) with sufficient neighbors (MinPts), marking outliers as noise, and excels with arbitrary shapes but falters with varying densities. OPTICS extends DBSCAN with hierarchical density analysis for better handling of density variations, while HDBSCAN selects stable clusters from a density hierarchy, ideal for complex datasets.</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Partitioning-based methods divide data by optimizing an objective function, automatically estimating the number of clusters (''k''). X-means extends k-means by splitting clusters and using the Bayesian Information Criterion (BIC) to select ''k'', assuming spherical clusters. G-means, similarly, splits clusters based on Gaussian distribution tests, fitting well-separated data.</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Hierarchical methods construct a cluster hierarchy, either merging (agglomerative) or splitting (divisive), and choose the optimal level using metrics like silhouette score. Agglomerative clustering merges clusters via linkage criteria but is computationally intensive, while BIRCH incrementally builds a tree for large datasets, balancing speed and accuracy.</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Model-based methods assume data arises from probability distributions, estimating both clusters and parameters. Gaussian Mixture Models (GMM) fit Gaussian distributions, selecting ''k'' with BIC or AIC, but struggle with non-Gaussian data. Variational Bayesian methods apply Bayesian inference for robust cluster estimation, reducing overfitting. Each approach suits specific data characteristics, such as cluster shape or noise levels, with selection depending on the dataset and application.</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Centroid-based ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Centroid-based ==</div></td> </tr> </table> MrOllie https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1290331489&oldid=prev Aasimayaz: Created a new section about automatic clustering algorithms 2025-05-14T04:25:53Z <p>Created a new section about automatic clustering algorithms</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 04:25, 14 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 7:</td> <td colspan="2" class="diff-lineno">Line 7:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The evolution of automatic clustering algorithms marks a significant milestone in unsupervised learning, enabling more efficient and scalable analysis of complex datasets. These methods empower data-driven discovery by automating a critical aspect of the clustering process, making them essential tools in fields ranging from scientific research to commercial data analytics.</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The evolution of automatic clustering algorithms marks a significant milestone in unsupervised learning, enabling more efficient and scalable analysis of complex datasets. These methods empower data-driven discovery by automating a critical aspect of the clustering process, making them essential tools in fields ranging from scientific research to commercial data analytics.</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>== Types of Automatic Clustering Algorithms ==</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Automatic clustering algorithms are grouped by their approach to determining cluster numbers and grouping data. Density-based methods identify clusters as high-density regions separated by low-density areas, estimating cluster counts without predefined input. For example, DBSCAN groups points within a radius (ε) with sufficient neighbors (MinPts), marking outliers as noise, and excels with arbitrary shapes but falters with varying densities. OPTICS extends DBSCAN with hierarchical density analysis for better handling of density variations, while HDBSCAN selects stable clusters from a density hierarchy, ideal for complex datasets.</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Partitioning-based methods divide data by optimizing an objective function, automatically estimating the number of clusters (''k''). X-means extends k-means by splitting clusters and using the Bayesian Information Criterion (BIC) to select ''k'', assuming spherical clusters. G-means, similarly, splits clusters based on Gaussian distribution tests, fitting well-separated data.</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Hierarchical methods construct a cluster hierarchy, either merging (agglomerative) or splitting (divisive), and choose the optimal level using metrics like silhouette score. Agglomerative clustering merges clusters via linkage criteria but is computationally intensive, while BIRCH incrementally builds a tree for large datasets, balancing speed and accuracy.</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Model-based methods assume data arises from probability distributions, estimating both clusters and parameters. Gaussian Mixture Models (GMM) fit Gaussian distributions, selecting ''k'' with BIC or AIC, but struggle with non-Gaussian data. Variational Bayesian methods apply Bayesian inference for robust cluster estimation, reducing overfitting. Each approach suits specific data characteristics, such as cluster shape or noise levels, with selection depending on the dataset and application.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Centroid-based ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Centroid-based ==</div></td> </tr> </table> Aasimayaz https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1290331251&oldid=prev Aasimayaz: grammatical fix in the background 2025-05-14T04:23:17Z <p>grammatical fix in the background</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 04:23, 14 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 2:</td> <td colspan="2" class="diff-lineno">Line 2:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Background ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Background ==</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">In</del> data mining and machine learning,<del style="font-weight: bold; text-decoration: none;"> clustering</del> is an unsupervised learning <del style="font-weight: bold; text-decoration: none;">technique</del> <del style="font-weight: bold; text-decoration: none;">used</del> <del style="font-weight: bold; text-decoration: none;">to group</del> similar data points into clusters based on defined similarity <del style="font-weight: bold; text-decoration: none;">metrics</del>. Unlike supervised learning, <del style="font-weight: bold; text-decoration: none;">where</del> labeled data <del style="font-weight: bold; text-decoration: none;">guides</del> <del style="font-weight: bold; text-decoration: none;">the</del> <del style="font-weight: bold; text-decoration: none;">model</del>, clustering operates without <del style="font-weight: bold; text-decoration: none;">prior knowledge of</del> class labels, <del style="font-weight: bold; text-decoration: none;">aiming instead</del> to <del style="font-weight: bold; text-decoration: none;">discover</del> <del style="font-weight: bold; text-decoration: none;">inherent</del> <del style="font-weight: bold; text-decoration: none;">groupings</del> within the <del style="font-weight: bold; text-decoration: none;">dataset</del>. A <del style="font-weight: bold; text-decoration: none;">central</del> challenge in clustering is determining the optimal number of clusters (k). Traditional<del style="font-weight: bold; text-decoration: none;"> clustering</del> algorithms <del style="font-weight: bold; text-decoration: none;">like</del> k-means require <del style="font-weight: bold; text-decoration: none;">the</del> <del style="font-weight: bold; text-decoration: none;">number</del> <del style="font-weight: bold; text-decoration: none;">of</del> <del style="font-weight: bold; text-decoration: none;">clusters</del> <del style="font-weight: bold; text-decoration: none;">to</del> <del style="font-weight: bold; text-decoration: none;">be</del> <del style="font-weight: bold; text-decoration: none;">specified</del> <del style="font-weight: bold; text-decoration: none;">beforehand.</del> <del style="font-weight: bold; text-decoration: none;">However,</del> <del style="font-weight: bold; text-decoration: none;">in</del> <del style="font-weight: bold; text-decoration: none;">many</del> real-world <del style="font-weight: bold; text-decoration: none;">scenarios—such</del> <del style="font-weight: bold; text-decoration: none;">as</del> <del style="font-weight: bold; text-decoration: none;">customer</del> segmentation, <del style="font-weight: bold; text-decoration: none;">anomaly</del> <del style="font-weight: bold; text-decoration: none;">detection</del>, or <del style="font-weight: bold; text-decoration: none;">gene</del> <del style="font-weight: bold; text-decoration: none;">expression</del> <del style="font-weight: bold; text-decoration: none;">analysis—the</del> <del style="font-weight: bold; text-decoration: none;">appropriate</del> <del style="font-weight: bold; text-decoration: none;">value</del> of <del style="font-weight: bold; text-decoration: none;">''k''</del> is <del style="font-weight: bold; text-decoration: none;">not</del> <del style="font-weight: bold; text-decoration: none;">known</del> <del style="font-weight: bold; text-decoration: none;">a</del> <del style="font-weight: bold; text-decoration: none;">priori</del> <del style="font-weight: bold; text-decoration: none;">and</del> <del style="font-weight: bold; text-decoration: none;">may</del> <del style="font-weight: bold; text-decoration: none;">be highly sensitive to the structure of the data</del>.</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">Clustering, a core technique in</ins> data mining and machine learning, is an unsupervised learning <ins style="font-weight: bold; text-decoration: none;">method</ins> <ins style="font-weight: bold; text-decoration: none;">that</ins> <ins style="font-weight: bold; text-decoration: none;">groups</ins> similar data points into clusters based on defined similarity <ins style="font-weight: bold; text-decoration: none;">measures, such as Euclidean distance or cosine similarity</ins>. Unlike supervised learning, <ins style="font-weight: bold; text-decoration: none;">which relies on</ins> labeled data <ins style="font-weight: bold; text-decoration: none;">to</ins> <ins style="font-weight: bold; text-decoration: none;">train</ins> <ins style="font-weight: bold; text-decoration: none;">models</ins>, clustering operates without <ins style="font-weight: bold; text-decoration: none;">predefined</ins> class labels, <ins style="font-weight: bold; text-decoration: none;">seeking</ins> to <ins style="font-weight: bold; text-decoration: none;">uncover</ins> <ins style="font-weight: bold; text-decoration: none;">natural</ins> <ins style="font-weight: bold; text-decoration: none;">patterns or structures</ins> within the <ins style="font-weight: bold; text-decoration: none;">data</ins>. A <ins style="font-weight: bold; text-decoration: none;">key</ins> challenge in clustering is determining the optimal number of clusters (<ins style="font-weight: bold; text-decoration: none;">often denoted as ''</ins>k<ins style="font-weight: bold; text-decoration: none;">''</ins>)<ins style="font-weight: bold; text-decoration: none;">, as this value significantly influences the quality and interpretability of the results</ins>. Traditional algorithms<ins style="font-weight: bold; text-decoration: none;">,</ins> <ins style="font-weight: bold; text-decoration: none;">such as</ins> k-means<ins style="font-weight: bold; text-decoration: none;">,</ins> require <ins style="font-weight: bold; text-decoration: none;">users</ins> <ins style="font-weight: bold; text-decoration: none;">to</ins> <ins style="font-weight: bold; text-decoration: none;">specify</ins> <ins style="font-weight: bold; text-decoration: none;">''k''</ins> <ins style="font-weight: bold; text-decoration: none;">in</ins> <ins style="font-weight: bold; text-decoration: none;">advance,</ins> <ins style="font-weight: bold; text-decoration: none;">which</ins> <ins style="font-weight: bold; text-decoration: none;">can</ins> <ins style="font-weight: bold; text-decoration: none;">be</ins> <ins style="font-weight: bold; text-decoration: none;">problematic</ins> <ins style="font-weight: bold; text-decoration: none;">in</ins> real-world <ins style="font-weight: bold; text-decoration: none;">applications</ins> <ins style="font-weight: bold; text-decoration: none;">like</ins> <ins style="font-weight: bold; text-decoration: none;">market</ins> segmentation, <ins style="font-weight: bold; text-decoration: none;">image</ins> <ins style="font-weight: bold; text-decoration: none;">analysis</ins>, or <ins style="font-weight: bold; text-decoration: none;">bioinformatics,</ins> <ins style="font-weight: bold; text-decoration: none;">where</ins> <ins style="font-weight: bold; text-decoration: none;">the</ins> <ins style="font-weight: bold; text-decoration: none;">true</ins> <ins style="font-weight: bold; text-decoration: none;">number</ins> of <ins style="font-weight: bold; text-decoration: none;">clusters</ins> is <ins style="font-weight: bold; text-decoration: none;">unknown</ins> <ins style="font-weight: bold; text-decoration: none;">or</ins> <ins style="font-weight: bold; text-decoration: none;">data</ins> <ins style="font-weight: bold; text-decoration: none;">complexity</ins> <ins style="font-weight: bold; text-decoration: none;">obscures</ins> <ins style="font-weight: bold; text-decoration: none;">clear</ins> <ins style="font-weight: bold; text-decoration: none;">groupings</ins>.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Automatic clustering algorithms<del style="font-weight: bold; text-decoration: none;"> are designed to</del> address this <del style="font-weight: bold; text-decoration: none;">limitation</del> by estimating the number of clusters <del style="font-weight: bold; text-decoration: none;">automatically</del> <del style="font-weight: bold; text-decoration: none;">as</del> <del style="font-weight: bold; text-decoration: none;">part</del> <del style="font-weight: bold; text-decoration: none;">of</del> the <del style="font-weight: bold; text-decoration: none;">clustering</del> <del style="font-weight: bold; text-decoration: none;">process.</del> <del style="font-weight: bold; text-decoration: none;">'''Automatic</del> <del style="font-weight: bold; text-decoration: none;">clustering</del> <del style="font-weight: bold; text-decoration: none;">algorithms</del>''' <del style="font-weight: bold; text-decoration: none;">are</del> algorithms <del style="font-weight: bold; text-decoration: none;">that</del> <del style="font-weight: bold; text-decoration: none;">can</del> <del style="font-weight: bold; text-decoration: none;">perform</del> <del style="font-weight: bold; text-decoration: none;">clustering</del> <del style="font-weight: bold; text-decoration: none;">without</del> <del style="font-weight: bold; text-decoration: none;">prior</del> <del style="font-weight: bold; text-decoration: none;">knowledge</del> <del style="font-weight: bold; text-decoration: none;">of</del> <del style="font-weight: bold; text-decoration: none;">data</del> <del style="font-weight: bold; text-decoration: none;">sets.</del> <del style="font-weight: bold; text-decoration: none;">In</del> <del style="font-weight: bold; text-decoration: none;">contrast</del> <del style="font-weight: bold; text-decoration: none;">with</del> <del style="font-weight: bold; text-decoration: none;">other</del> <del style="font-weight: bold; text-decoration: none;">[[cluster</del> <del style="font-weight: bold; text-decoration: none;">analysis]] techniques</del>, <del style="font-weight: bold; text-decoration: none;">automatic</del> <del style="font-weight: bold; text-decoration: none;">clustering</del> <del style="font-weight: bold; text-decoration: none;">algorithms</del> <del style="font-weight: bold; text-decoration: none;">can</del> <del style="font-weight: bold; text-decoration: none;">determine</del> <del style="font-weight: bold; text-decoration: none;">the</del> optimal number of clusters even in <del style="font-weight: bold; text-decoration: none;">the</del> <del style="font-weight: bold; text-decoration: none;">presence of</del> noise <del style="font-weight: bold; text-decoration: none;">and</del> <del style="font-weight: bold; text-decoration: none;">outlier points</del>.<del style="font-weight: bold; text-decoration: none;">&lt;ref&gt;[[Outlier]]&lt;/ref&gt;</del> <del style="font-weight: bold; text-decoration: none;">These</del> <del style="font-weight: bold; text-decoration: none;">methods</del> <del style="font-weight: bold; text-decoration: none;">incorporate</del> <del style="font-weight: bold; text-decoration: none;">strategies</del> <del style="font-weight: bold; text-decoration: none;">such</del> <del style="font-weight: bold; text-decoration: none;">as</del> <del style="font-weight: bold; text-decoration: none;">statistical model selection (e.g.</del>, <del style="font-weight: bold; text-decoration: none;">BIC</del> <del style="font-weight: bold; text-decoration: none;">or</del> <del style="font-weight: bold; text-decoration: none;">AIC),</del> <del style="font-weight: bold; text-decoration: none;">density</del> <del style="font-weight: bold; text-decoration: none;">estimation,</del> <del style="font-weight: bold; text-decoration: none;">or</del> <del style="font-weight: bold; text-decoration: none;">hierarchical</del> <del style="font-weight: bold; text-decoration: none;">merging/splitting</del> <del style="font-weight: bold; text-decoration: none;">to</del> <del style="font-weight: bold; text-decoration: none;">adaptively</del> <del style="font-weight: bold; text-decoration: none;">find a suitable number of clusters</del>. Their ability to <del style="font-weight: bold; text-decoration: none;">operate</del> without <del style="font-weight: bold; text-decoration: none;">manual</del> <del style="font-weight: bold; text-decoration: none;">input</del> makes them <del style="font-weight: bold; text-decoration: none;">particularly useful</del> for exploratory data analysis<del style="font-weight: bold; text-decoration: none;"> and</del> large-scale applications where <del style="font-weight: bold; text-decoration: none;">user</del> <del style="font-weight: bold; text-decoration: none;">supervision</del> is <del style="font-weight: bold; text-decoration: none;">limited</del>.</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Automatic clustering algorithms address this <ins style="font-weight: bold; text-decoration: none;">challenge</ins> by<ins style="font-weight: bold; text-decoration: none;"> autonomously</ins> estimating the number of clusters <ins style="font-weight: bold; text-decoration: none;">during</ins> <ins style="font-weight: bold; text-decoration: none;">the</ins> <ins style="font-weight: bold; text-decoration: none;">clustering</ins> <ins style="font-weight: bold; text-decoration: none;">process, eliminating</ins> the <ins style="font-weight: bold; text-decoration: none;">need</ins> <ins style="font-weight: bold; text-decoration: none;">for</ins> <ins style="font-weight: bold; text-decoration: none;">manual</ins> <ins style="font-weight: bold; text-decoration: none;">specification of</ins> ''<ins style="font-weight: bold; text-decoration: none;">k</ins>'<ins style="font-weight: bold; text-decoration: none;">'.</ins> <ins style="font-weight: bold; text-decoration: none;">These</ins> algorithms <ins style="font-weight: bold; text-decoration: none;">employ</ins> <ins style="font-weight: bold; text-decoration: none;">techniques</ins> <ins style="font-weight: bold; text-decoration: none;">such</ins> <ins style="font-weight: bold; text-decoration: none;">as</ins> <ins style="font-weight: bold; text-decoration: none;">statistical</ins> <ins style="font-weight: bold; text-decoration: none;">criteria</ins> <ins style="font-weight: bold; text-decoration: none;">(e.g.,</ins> <ins style="font-weight: bold; text-decoration: none;">Bayesian</ins> <ins style="font-weight: bold; text-decoration: none;">Information</ins> <ins style="font-weight: bold; text-decoration: none;">Criterion</ins> <ins style="font-weight: bold; text-decoration: none;">or</ins> <ins style="font-weight: bold; text-decoration: none;">Akaike</ins> <ins style="font-weight: bold; text-decoration: none;">Information</ins> <ins style="font-weight: bold; text-decoration: none;">Criterion),</ins> <ins style="font-weight: bold; text-decoration: none;">density-based</ins> <ins style="font-weight: bold; text-decoration: none;">approaches</ins>, <ins style="font-weight: bold; text-decoration: none;">or</ins> <ins style="font-weight: bold; text-decoration: none;">hierarchical</ins> <ins style="font-weight: bold; text-decoration: none;">splitting/merging</ins> <ins style="font-weight: bold; text-decoration: none;">to</ins> <ins style="font-weight: bold; text-decoration: none;">identify</ins> <ins style="font-weight: bold; text-decoration: none;">an</ins> optimal number of clusters<ins style="font-weight: bold; text-decoration: none;">,</ins> even in <ins style="font-weight: bold; text-decoration: none;">datasets</ins> <ins style="font-weight: bold; text-decoration: none;">with</ins> noise <ins style="font-weight: bold; text-decoration: none;">or</ins> <ins style="font-weight: bold; text-decoration: none;">outliers</ins>. <ins style="font-weight: bold; text-decoration: none;">By</ins> <ins style="font-weight: bold; text-decoration: none;">adapting</ins> <ins style="font-weight: bold; text-decoration: none;">to</ins> <ins style="font-weight: bold; text-decoration: none;">the</ins> <ins style="font-weight: bold; text-decoration: none;">data’s</ins> <ins style="font-weight: bold; text-decoration: none;">inherent</ins> <ins style="font-weight: bold; text-decoration: none;">structure</ins>, <ins style="font-weight: bold; text-decoration: none;">automatic</ins> <ins style="font-weight: bold; text-decoration: none;">clustering</ins> <ins style="font-weight: bold; text-decoration: none;">algorithms</ins> <ins style="font-weight: bold; text-decoration: none;">enhance</ins> <ins style="font-weight: bold; text-decoration: none;">the</ins> <ins style="font-weight: bold; text-decoration: none;">robustness</ins> <ins style="font-weight: bold; text-decoration: none;">and</ins> <ins style="font-weight: bold; text-decoration: none;">flexibility</ins> <ins style="font-weight: bold; text-decoration: none;">of</ins> <ins style="font-weight: bold; text-decoration: none;">unsupervised</ins> <ins style="font-weight: bold; text-decoration: none;">learning</ins>. Their ability to <ins style="font-weight: bold; text-decoration: none;">function</ins> without <ins style="font-weight: bold; text-decoration: none;">prior</ins> <ins style="font-weight: bold; text-decoration: none;">knowledge</ins> makes them <ins style="font-weight: bold; text-decoration: none;">invaluable</ins> for exploratory data analysis<ins style="font-weight: bold; text-decoration: none;">,</ins> large-scale<ins style="font-weight: bold; text-decoration: none;"> data processing, and</ins> applications where <ins style="font-weight: bold; text-decoration: none;">human</ins> <ins style="font-weight: bold; text-decoration: none;">intervention</ins> is <ins style="font-weight: bold; text-decoration: none;">impractical</ins>.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The <del style="font-weight: bold; text-decoration: none;">development</del> of automatic clustering algorithms <del style="font-weight: bold; text-decoration: none;">represents</del> <del style="font-weight: bold; text-decoration: none;">an</del> <del style="font-weight: bold; text-decoration: none;">important</del> <del style="font-weight: bold; text-decoration: none;">advancement</del> in unsupervised learning, <del style="font-weight: bold; text-decoration: none;">allowing for</del> more <del style="font-weight: bold; text-decoration: none;">autonomous</del> and data-driven discovery <del style="font-weight: bold; text-decoration: none;">processes</del> in <del style="font-weight: bold; text-decoration: none;">complex</del> <del style="font-weight: bold; text-decoration: none;">datasets.</del> <del style="font-weight: bold; text-decoration: none;">{{context</del> <del style="font-weight: bold; text-decoration: none;">needed|date=September</del> <del style="font-weight: bold; text-decoration: none;">2021}}</del></div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The <ins style="font-weight: bold; text-decoration: none;">evolution</ins> of automatic clustering algorithms <ins style="font-weight: bold; text-decoration: none;">marks</ins> <ins style="font-weight: bold; text-decoration: none;">a</ins> <ins style="font-weight: bold; text-decoration: none;">significant</ins> <ins style="font-weight: bold; text-decoration: none;">milestone</ins> in unsupervised learning, <ins style="font-weight: bold; text-decoration: none;">enabling</ins> more <ins style="font-weight: bold; text-decoration: none;">efficient</ins> and<ins style="font-weight: bold; text-decoration: none;"> scalable analysis of complex datasets. These methods empower</ins> data-driven discovery <ins style="font-weight: bold; text-decoration: none;">by automating a critical aspect of the clustering process, making them essential tools</ins> in <ins style="font-weight: bold; text-decoration: none;">fields</ins> <ins style="font-weight: bold; text-decoration: none;">ranging</ins> <ins style="font-weight: bold; text-decoration: none;">from</ins> <ins style="font-weight: bold; text-decoration: none;">scientific</ins> <ins style="font-weight: bold; text-decoration: none;">research to commercial data analytics.</ins></div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Centroid-based ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Centroid-based ==</div></td> </tr> </table> Aasimayaz https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1290312723&oldid=prev Aasimayaz: changed the heading style 2025-05-14T01:24:57Z <p>changed the heading style</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 01:24, 14 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Data processing algorithm}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Data processing algorithm}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">=</del>== <del style="font-weight: bold; text-decoration: none;">'''</del>Background<del style="font-weight: bold; text-decoration: none;">'''</del> <del style="font-weight: bold; text-decoration: none;">=</del>==</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>== Background ==</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In data mining and machine learning, clustering is an unsupervised learning technique used to group similar data points into clusters based on defined similarity metrics. Unlike supervised learning, where labeled data guides the model, clustering operates without prior knowledge of class labels, aiming instead to discover inherent groupings within the dataset. A central challenge in clustering is determining the optimal number of clusters (k). Traditional clustering algorithms like k-means require the number of clusters to be specified beforehand. However, in many real-world scenarios—such as customer segmentation, anomaly detection, or gene expression analysis—the appropriate value of ''k'' is not known a priori and may be highly sensitive to the structure of the data.</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In data mining and machine learning, clustering is an unsupervised learning technique used to group similar data points into clusters based on defined similarity metrics. Unlike supervised learning, where labeled data guides the model, clustering operates without prior knowledge of class labels, aiming instead to discover inherent groupings within the dataset. A central challenge in clustering is determining the optimal number of clusters (k). Traditional clustering algorithms like k-means require the number of clusters to be specified beforehand. However, in many real-world scenarios—such as customer segmentation, anomaly detection, or gene expression analysis—the appropriate value of ''k'' is not known a priori and may be highly sensitive to the structure of the data.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> </table> Aasimayaz https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1290312693&oldid=prev Aasimayaz: Added a new about background to highlight the problem 2025-05-14T01:24:36Z <p>Added a new about background to highlight the problem</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 01:24, 14 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Data processing algorithm}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Data processing algorithm}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>=== '''Background''' ===</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Automatic clustering algorithms''' are algorithms that can perform clustering without prior knowledge of data sets. In contrast with other [[cluster analysis]] techniques, automatic clustering algorithms can determine the optimal number of clusters even in the presence of noise and outlier points.&lt;ref&gt;[[Outlier]]&lt;/ref&gt;{{context needed|date=September 2021}}</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In data mining and machine learning, clustering is an unsupervised learning technique used to group similar data points into clusters based on defined similarity metrics. Unlike supervised learning, where labeled data guides the model, clustering operates without prior knowledge of class labels, aiming instead to discover inherent groupings within the dataset. A central challenge in clustering is determining the optimal number of clusters (k). Traditional clustering algorithms like k-means require the number of clusters to be specified beforehand. However, in many real-world scenarios—such as customer segmentation, anomaly detection, or gene expression analysis—the appropriate value of ''k'' is not known a priori and may be highly sensitive to the structure of the data.</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Automatic clustering algorithms are designed to address this limitation by estimating the number of clusters automatically as part of the clustering process. '''Automatic clustering algorithms''' are algorithms that can perform clustering without prior knowledge of data sets. In contrast with other [[cluster analysis]] techniques, automatic clustering algorithms can determine the optimal number of clusters even in the presence of noise and outlier points.&lt;ref&gt;[[Outlier]]&lt;/ref&gt; These methods incorporate strategies such as statistical model selection (e.g., BIC or AIC), density estimation, or hierarchical merging/splitting to adaptively find a suitable number of clusters. Their ability to operate without manual input makes them particularly useful for exploratory data analysis and large-scale applications where user supervision is limited.</div></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The development of automatic clustering algorithms represents an important advancement in unsupervised learning, allowing for more autonomous and data-driven discovery processes in complex datasets. {{context needed|date=September 2021}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Centroid-based ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Centroid-based ==</div></td> </tr> </table> Aasimayaz https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1289813445&oldid=prev Headbomb: /* Density-based */ | Altered template type. Add: isbn, title, chapter, authors 1-5. | Use this tool. Report bugs. | #UCB_Gadget 2025-05-11T01:11:00Z <p><span class="autocomment">Density-based: </span> | Altered template type. Add: isbn, title, chapter, authors 1-5. | <a href="/wiki/Wikipedia:UCB" class="mw-redirect" title="Wikipedia:UCB">Use this tool</a>. <a href="/wiki/Wikipedia:DBUG" class="mw-redirect" title="Wikipedia:DBUG">Report bugs</a>. | #UCB_Gadget</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 01:11, 11 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 24:</td> <td colspan="2" class="diff-lineno">Line 24:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The density-based clustering algorithm uses autonomous machine learning that identifies patterns regarding geographical location and distance to a particular number of neighbors. It is considered autonomous because a priori knowledge on what is a cluster is not required.&lt;ref&gt;{{Cite web|url=http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/how-density-based-clustering-works.htm|title=How Density-based Clustering works—ArcGIS Pro {{!}} ArcGIS Desktop|website=pro.arcgis.com|language=en|access-date=2018-11-05}}&lt;/ref&gt; This type of algorithm provides different methods to find clusters in the data. The fastest method is [[DBSCAN]], which uses a defined distance to differentiate between dense groups of information and sparser noise. Moreover, HDBSCAN can self-adjust by using a range of distances instead of a specified one. Lastly, the method [[OPTICS algorithm|OPTICS]] creates a reachability plot based on the distance from neighboring features to separate noise from clusters of varying density.</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The density-based clustering algorithm uses autonomous machine learning that identifies patterns regarding geographical location and distance to a particular number of neighbors. It is considered autonomous because a priori knowledge on what is a cluster is not required.&lt;ref&gt;{{Cite web|url=http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/how-density-based-clustering-works.htm|title=How Density-based Clustering works—ArcGIS Pro {{!}} ArcGIS Desktop|website=pro.arcgis.com|language=en|access-date=2018-11-05}}&lt;/ref&gt; This type of algorithm provides different methods to find clusters in the data. The fastest method is [[DBSCAN]], which uses a defined distance to differentiate between dense groups of information and sparser noise. Moreover, HDBSCAN can self-adjust by using a range of distances instead of a specified one. Lastly, the method [[OPTICS algorithm|OPTICS]] creates a reachability plot based on the distance from neighboring features to separate noise from clusters of varying density.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>These methods still require the user to provide the cluster center and cannot be considered automatic. The Automatic Local Density Clustering Algorithm (ALDC) is an example of the new research focused on developing automatic density-based clustering. ALDC works out local density and distance deviation of every point, thus expanding the difference between the potential cluster center and other points. This expansion allows the machine to work automatically. The machine identifies cluster centers and assigns the points that are left by their closest neighbor of higher density. ''&lt;ref&gt;{{Cite book<del style="font-weight: bold; text-decoration: none;">|title=An algorithm for automatic</del> <del style="font-weight: bold; text-decoration: none;">recognition of cluster centers based on local density clustering - IEEE Conference Publication</del>|date=May 2017 |pages=1347–1351 |language=en-US|doi=10.1109/CCDC.2017.7978726|isbn=978-1-5090-4657-7 |s2cid=23267464 }}&lt;/ref&gt;''</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>These methods still require the user to provide the cluster center and cannot be considered automatic. The Automatic Local Density Clustering Algorithm (ALDC) is an example of the new research focused on developing automatic density-based clustering. ALDC works out local density and distance deviation of every point, thus expanding the difference between the potential cluster center and other points. This expansion allows the machine to work automatically. The machine identifies cluster centers and assigns the points that are left by their closest neighbor of higher density. ''&lt;ref&gt;{{Cite book |date=May 2017 |pages=1347–1351 |language=en-US|doi=10.1109/CCDC.2017.7978726|isbn=978-1-5090-4657-7 |s2cid=23267464<ins style="font-weight: bold; text-decoration: none;"> |chapter=An algorithm for automatic recognition of cluster centers based on local density clustering |title=2017 29th Chinese Control and Decision Conference (CCDC) |last1=Xuanzuo |first1=Ye |last2=Dinghao |first2=Li |last3=Xiongxiong |first3=He</ins> }}&lt;/ref&gt;''</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In the automation of data density to identify clusters, research has also been focused on artificially generating the algorithms. For instance, the Estimation of Distribution Algorithms guarantees the generation of valid algorithms by the [[directed acyclic graph]] (DAG), in which nodes represent procedures (building block) and edges represent possible execution sequences between two nodes. Building Blocks determine the EDA's alphabet or, in other words, any generated algorithm. Clustering algorithms artificially generated are compared to DBSCAN, a manual algorithm, in experimental results.&lt;ref&gt;{{Cite <del style="font-weight: bold; text-decoration: none;">journal</del>|<del style="font-weight: bold; text-decoration: none;">journal</del>=2012 <del style="font-weight: bold; text-decoration: none;">IEEE</del> <del style="font-weight: bold; text-decoration: none;">Congress on Evolutionary</del> <del style="font-weight: bold; text-decoration: none;">Computation</del>|<del style="font-weight: bold; text-decoration: none;">title</del>=AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms <del style="font-weight: bold; text-decoration: none;">-</del> IEEE <del style="font-weight: bold; text-decoration: none;">Conference</del> <del style="font-weight: bold; text-decoration: none;">Publication</del>|<del style="font-weight: bold; text-decoration: none;">date</del>=<del style="font-weight: bold; text-decoration: none;">June</del> <del style="font-weight: bold; text-decoration: none;">2012</del> |<del style="font-weight: bold; text-decoration: none;">pages</del>=<del style="font-weight: bold; text-decoration: none;">1–7</del> |<del style="font-weight: bold; text-decoration: none;">language</del>=<del style="font-weight: bold; text-decoration: none;">en-US</del>|<del style="font-weight: bold; text-decoration: none;">doi</del>=<del style="font-weight: bold; text-decoration: none;">10</del>.<del style="font-weight: bold; text-decoration: none;">1109/CEC</del>.<del style="font-weight: bold; text-decoration: none;">2012</del>.<del style="font-weight: bold; text-decoration: none;">6252874</del>|<del style="font-weight: bold; text-decoration: none;">citeseerx</del>=<del style="font-weight: bold; text-decoration: none;">10</del>.1<del style="font-weight: bold; text-decoration: none;">.1.308.9977</del>}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In the automation of data density to identify clusters, research has also been focused on artificially generating the algorithms. For instance, the Estimation of Distribution Algorithms guarantees the generation of valid algorithms by the [[directed acyclic graph]] (DAG), in which nodes represent procedures (building block) and edges represent possible execution sequences between two nodes. Building Blocks determine the EDA's alphabet or, in other words, any generated algorithm. Clustering algorithms artificially generated are compared to DBSCAN, a manual algorithm, in experimental results.&lt;ref&gt;{{Cite <ins style="font-weight: bold; text-decoration: none;">book </ins>|<ins style="font-weight: bold; text-decoration: none;">date</ins>=<ins style="font-weight: bold; text-decoration: none;">June </ins>2012 <ins style="font-weight: bold; text-decoration: none;">|pages=1–7</ins> <ins style="font-weight: bold; text-decoration: none;">|language=en-US|doi=10.1109/CEC.2012.6252874|citeseerx=10.1.1.308.9977</ins> |<ins style="font-weight: bold; text-decoration: none;">chapter</ins>=AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms <ins style="font-weight: bold; text-decoration: none;">|title=2012</ins> IEEE <ins style="font-weight: bold; text-decoration: none;">Congress on Evolutionary Computation</ins> |<ins style="font-weight: bold; text-decoration: none;">last1</ins>=<ins style="font-weight: bold; text-decoration: none;">Meiguins</ins> <ins style="font-weight: bold; text-decoration: none;">|first1=Aruanda S. G.</ins> |<ins style="font-weight: bold; text-decoration: none;">last2</ins>=<ins style="font-weight: bold; text-decoration: none;">Limao</ins> |<ins style="font-weight: bold; text-decoration: none;">first2</ins>=<ins style="font-weight: bold; text-decoration: none;">Roberto C. </ins>|<ins style="font-weight: bold; text-decoration: none;">last3</ins>=<ins style="font-weight: bold; text-decoration: none;">Meiguins |first3=Bianchi S</ins>.<ins style="font-weight: bold; text-decoration: none;"> |last4=Junior |first4=Samuel F</ins>.<ins style="font-weight: bold; text-decoration: none;"> S</ins>.<ins style="font-weight: bold; text-decoration: none;"> </ins>|<ins style="font-weight: bold; text-decoration: none;">last5</ins>=<ins style="font-weight: bold; text-decoration: none;">Freitas |first5=Alex A</ins>.<ins style="font-weight: bold; text-decoration: none;"> |isbn=978-</ins>1<ins style="font-weight: bold; text-decoration: none;">-4673-1509-8 </ins>}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td> </tr> </table> Headbomb https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1281321886&oldid=prev LooksGreatInATurtleNeck: There was a Script warning on the page from a cite journal template, "Cite journal requires |journal=", fixed by adding a journal= field & filling it in 2025-03-19T17:30:22Z <p>There was a Script warning on the page from a cite journal template, &quot;Cite journal requires |journal=&quot;, fixed by adding a journal= field &amp; filling it in</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 17:30, 19 March 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 26:</td> <td colspan="2" class="diff-lineno">Line 26:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>These methods still require the user to provide the cluster center and cannot be considered automatic. The Automatic Local Density Clustering Algorithm (ALDC) is an example of the new research focused on developing automatic density-based clustering. ALDC works out local density and distance deviation of every point, thus expanding the difference between the potential cluster center and other points. This expansion allows the machine to work automatically. The machine identifies cluster centers and assigns the points that are left by their closest neighbor of higher density. ''&lt;ref&gt;{{Cite book|title=An algorithm for automatic recognition of cluster centers based on local density clustering - IEEE Conference Publication|date=May 2017 |pages=1347–1351 |language=en-US|doi=10.1109/CCDC.2017.7978726|isbn=978-1-5090-4657-7 |s2cid=23267464 }}&lt;/ref&gt;''</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>These methods still require the user to provide the cluster center and cannot be considered automatic. The Automatic Local Density Clustering Algorithm (ALDC) is an example of the new research focused on developing automatic density-based clustering. ALDC works out local density and distance deviation of every point, thus expanding the difference between the potential cluster center and other points. This expansion allows the machine to work automatically. The machine identifies cluster centers and assigns the points that are left by their closest neighbor of higher density. ''&lt;ref&gt;{{Cite book|title=An algorithm for automatic recognition of cluster centers based on local density clustering - IEEE Conference Publication|date=May 2017 |pages=1347–1351 |language=en-US|doi=10.1109/CCDC.2017.7978726|isbn=978-1-5090-4657-7 |s2cid=23267464 }}&lt;/ref&gt;''</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In the automation of data density to identify clusters, research has also been focused on artificially generating the algorithms. For instance, the Estimation of Distribution Algorithms guarantees the generation of valid algorithms by the [[directed acyclic graph]] (DAG), in which nodes represent procedures (building block) and edges represent possible execution sequences between two nodes. Building Blocks determine the EDA's alphabet or, in other words, any generated algorithm. Clustering algorithms artificially generated are compared to DBSCAN, a manual algorithm, in experimental results.&lt;ref&gt;{{Cite journal|title=AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms - IEEE Conference Publication|date=June 2012 |pages=1–7 |language=en-US|doi=10.1109/CEC.2012.6252874|citeseerx=10.1.1.308.9977}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In the automation of data density to identify clusters, research has also been focused on artificially generating the algorithms. For instance, the Estimation of Distribution Algorithms guarantees the generation of valid algorithms by the [[directed acyclic graph]] (DAG), in which nodes represent procedures (building block) and edges represent possible execution sequences between two nodes. Building Blocks determine the EDA's alphabet or, in other words, any generated algorithm. Clustering algorithms artificially generated are compared to DBSCAN, a manual algorithm, in experimental results.&lt;ref&gt;{{Cite journal<ins style="font-weight: bold; text-decoration: none;">|journal=2012 IEEE Congress on Evolutionary Computation</ins>|title=AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms - IEEE Conference Publication|date=June 2012 |pages=1–7 |language=en-US|doi=10.1109/CEC.2012.6252874|citeseerx=10.1.1.308.9977}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td> </tr> </table> LooksGreatInATurtleNeck https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1281321179&oldid=prev LooksGreatInATurtleNeck: There was a Script warning on the page from a cite book template, "Category:CS1 maint: date and year", fixed by removing the redundant year= field as date= was already set 2025-03-19T17:24:21Z <p>There was a Script warning on the page from a cite book template, &quot;Category:CS1 maint: date and year&quot;, fixed by removing the redundant year= field as date= was already set</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 17:24, 19 March 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 8:</td> <td colspan="2" class="diff-lineno">Line 8:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Automated selection of ''k'' in a [[K-means clustering|''K''-means clustering algorithm]], one of the most used centroid-based clustering algorithms, is still a major problem in machine learning. The most accepted solution to this problem is the [[Elbow method (clustering)|elbow method]]. It consists of running ''k''-means clustering to the data set with a range of values, calculating the sum of squared errors for each, and plotting them in a line chart. If the chart looks like an arm, the best value of ''k'' will be on the "elbow".&lt;ref&gt;{{Cite web|url=https://bl.ocks.org/rpgove/0060ff3b656618e9136b|title=Using the elbow method to determine the optimal number of clusters for k-means clustering|website=bl.ocks.org|access-date=2018-11-12}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Automated selection of ''k'' in a [[K-means clustering|''K''-means clustering algorithm]], one of the most used centroid-based clustering algorithms, is still a major problem in machine learning. The most accepted solution to this problem is the [[Elbow method (clustering)|elbow method]]. It consists of running ''k''-means clustering to the data set with a range of values, calculating the sum of squared errors for each, and plotting them in a line chart. If the chart looks like an arm, the best value of ''k'' will be on the "elbow".&lt;ref&gt;{{Cite web|url=https://bl.ocks.org/rpgove/0060ff3b656618e9136b|title=Using the elbow method to determine the optimal number of clusters for k-means clustering|website=bl.ocks.org|access-date=2018-11-12}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Another method that modifies the ''k''-means algorithm for automatically choosing the optimal number of clusters is the ''G''-means algorithm. It was developed from the hypothesis that a subset of the data follows a Gaussian distribution. Thus, ''k'' is increased until each ''k''-means center's data is Gaussian. This algorithm only requires the standard statistical significance level as a parameter and does not set limits for the covariance of the data.&lt;ref&gt;{{cite conference |url=https://proceedings.neurips.cc/paper/2003/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf |title=Learning the k in k-means |last1=Hamerly |first1=Greg |last2=Elkan |first2=Charles |date=9 December <del style="font-weight: bold; text-decoration: none;">2003 |year=</del>2003 |conference=Proceedings of the 16th International Conference on Neural Information Processing Systems |conference-url=https://dl.acm.org/doi/proceedings/10.5555/2981345 |editor=Sebastian Thrun |editor2=Lawrence K Saul |editor3=Bernhard H Schölkopf|publisher=MIT Press |archive-url=https://web.archive.org/web/20221016235553/https://proceedings.neurips.cc/paper/2003/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf |archive-date=16 October 2022 |location=Whistler, British Columbia, Canada |pages=281–288 |access-date=3 November 2022 |quote= |language=en-us }}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Another method that modifies the ''k''-means algorithm for automatically choosing the optimal number of clusters is the ''G''-means algorithm. It was developed from the hypothesis that a subset of the data follows a Gaussian distribution. Thus, ''k'' is increased until each ''k''-means center's data is Gaussian. This algorithm only requires the standard statistical significance level as a parameter and does not set limits for the covariance of the data.&lt;ref&gt;{{cite conference |url=https://proceedings.neurips.cc/paper/2003/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf |title=Learning the k in k-means |last1=Hamerly |first1=Greg |last2=Elkan |first2=Charles |date=9 December 2003 |conference=Proceedings of the 16th International Conference on Neural Information Processing Systems |conference-url=https://dl.acm.org/doi/proceedings/10.5555/2981345 |editor=Sebastian Thrun |editor2=Lawrence K Saul |editor3=Bernhard H Schölkopf|publisher=MIT Press |archive-url=https://web.archive.org/web/20221016235553/https://proceedings.neurips.cc/paper/2003/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf |archive-date=16 October 2022 |location=Whistler, British Columbia, Canada |pages=281–288 |access-date=3 November 2022 |quote= |language=en-us }}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Connectivity-based (hierarchical clustering) ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Connectivity-based (hierarchical clustering) ==</div></td> </tr> </table> LooksGreatInATurtleNeck https://en.wikipedia.org/w/index.php?title=Automatic_clustering_algorithms&diff=1268056454&oldid=prev Citation bot: Altered template type. Add: isbn, pages, date. | Use this bot. Report bugs. | Suggested by Abductive | Category:Clustering criteria | #UCB_Category 12/20 2025-01-08T00:01:12Z <p>Altered template type. Add: isbn, pages, date. | <a href="/wiki/Wikipedia:UCB" class="mw-redirect" title="Wikipedia:UCB">Use this bot</a>. <a href="/wiki/Wikipedia:DBUG" class="mw-redirect" title="Wikipedia:DBUG">Report bugs</a>. | Suggested by Abductive | <a href="/wiki/Category:Clustering_criteria" title="Category:Clustering criteria">Category:Clustering criteria</a> | #UCB_Category 12/20</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 00:01, 8 January 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 24:</td> <td colspan="2" class="diff-lineno">Line 24:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The density-based clustering algorithm uses autonomous machine learning that identifies patterns regarding geographical location and distance to a particular number of neighbors. It is considered autonomous because a priori knowledge on what is a cluster is not required.&lt;ref&gt;{{Cite web|url=http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/how-density-based-clustering-works.htm|title=How Density-based Clustering works—ArcGIS Pro {{!}} ArcGIS Desktop|website=pro.arcgis.com|language=en|access-date=2018-11-05}}&lt;/ref&gt; This type of algorithm provides different methods to find clusters in the data. The fastest method is [[DBSCAN]], which uses a defined distance to differentiate between dense groups of information and sparser noise. Moreover, HDBSCAN can self-adjust by using a range of distances instead of a specified one. Lastly, the method [[OPTICS algorithm|OPTICS]] creates a reachability plot based on the distance from neighboring features to separate noise from clusters of varying density.</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The density-based clustering algorithm uses autonomous machine learning that identifies patterns regarding geographical location and distance to a particular number of neighbors. It is considered autonomous because a priori knowledge on what is a cluster is not required.&lt;ref&gt;{{Cite web|url=http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/how-density-based-clustering-works.htm|title=How Density-based Clustering works—ArcGIS Pro {{!}} ArcGIS Desktop|website=pro.arcgis.com|language=en|access-date=2018-11-05}}&lt;/ref&gt; This type of algorithm provides different methods to find clusters in the data. The fastest method is [[DBSCAN]], which uses a defined distance to differentiate between dense groups of information and sparser noise. Moreover, HDBSCAN can self-adjust by using a range of distances instead of a specified one. Lastly, the method [[OPTICS algorithm|OPTICS]] creates a reachability plot based on the distance from neighboring features to separate noise from clusters of varying density.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>These methods still require the user to provide the cluster center and cannot be considered automatic. The Automatic Local Density Clustering Algorithm (ALDC) is an example of the new research focused on developing automatic density-based clustering. ALDC works out local density and distance deviation of every point, thus expanding the difference between the potential cluster center and other points. This expansion allows the machine to work automatically. The machine identifies cluster centers and assigns the points that are left by their closest neighbor of higher density. ''&lt;ref&gt;{{Cite <del style="font-weight: bold; text-decoration: none;">journal</del>|title=An algorithm for automatic recognition of cluster centers based on local density clustering - IEEE Conference Publication|language=en-US|doi=10.1109/CCDC.2017.7978726|s2cid=23267464 }}&lt;/ref&gt;''</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>These methods still require the user to provide the cluster center and cannot be considered automatic. The Automatic Local Density Clustering Algorithm (ALDC) is an example of the new research focused on developing automatic density-based clustering. ALDC works out local density and distance deviation of every point, thus expanding the difference between the potential cluster center and other points. This expansion allows the machine to work automatically. The machine identifies cluster centers and assigns the points that are left by their closest neighbor of higher density. ''&lt;ref&gt;{{Cite <ins style="font-weight: bold; text-decoration: none;">book</ins>|title=An algorithm for automatic recognition of cluster centers based on local density clustering - IEEE Conference Publication<ins style="font-weight: bold; text-decoration: none;">|date=May 2017 |pages=1347–1351 </ins>|language=en-US|doi=10.1109/CCDC.2017.7978726<ins style="font-weight: bold; text-decoration: none;">|isbn=978-1-5090-4657-7 </ins>|s2cid=23267464 }}&lt;/ref&gt;''</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In the automation of data density to identify clusters, research has also been focused on artificially generating the algorithms. For instance, the Estimation of Distribution Algorithms guarantees the generation of valid algorithms by the [[directed acyclic graph]] (DAG), in which nodes represent procedures (building block) and edges represent possible execution sequences between two nodes. Building Blocks determine the EDA's alphabet or, in other words, any generated algorithm. Clustering algorithms artificially generated are compared to DBSCAN, a manual algorithm, in experimental results.&lt;ref&gt;{{Cite journal|title=AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms - IEEE Conference Publication|language=en-US|doi=10.1109/CEC.2012.6252874|citeseerx=10.1.1.308.9977}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In the automation of data density to identify clusters, research has also been focused on artificially generating the algorithms. For instance, the Estimation of Distribution Algorithms guarantees the generation of valid algorithms by the [[directed acyclic graph]] (DAG), in which nodes represent procedures (building block) and edges represent possible execution sequences between two nodes. Building Blocks determine the EDA's alphabet or, in other words, any generated algorithm. Clustering algorithms artificially generated are compared to DBSCAN, a manual algorithm, in experimental results.&lt;ref&gt;{{Cite journal|title=AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms - IEEE Conference Publication<ins style="font-weight: bold; text-decoration: none;">|date=June 2012 |pages=1–7 </ins>|language=en-US|doi=10.1109/CEC.2012.6252874|citeseerx=10.1.1.308.9977}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td> </tr> </table> Citation bot