Density-based clustering validation: Difference between revisions
(Beep, Boop). I have removed a template which is not valid in Draftspace |
Citation bot (talk | contribs) Alter: title, template type, pages. Add: date, doi, website, title, pages, arxiv, bibcode, pmc, pmid, doi-access, page, issue, volume, isbn, chapter, authors 1-1. Removed URL that duplicated identifier. Changed bare reference to CS1/2. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Jay8g | #UCB_toolbar |
||
Line 13: | Line 13: | ||
Unlike traditional validation measures, which often rely on compact and well-separated clusters, DBCV index evaluates how well clusters are defined in terms of local density variations and structural coherence. |
Unlike traditional validation measures, which often rely on compact and well-separated clusters, DBCV index evaluates how well clusters are defined in terms of local density variations and structural coherence. |
||
This metric was introduced in 2014 by by David Moulavi and colleagues in their work.<ref name = Moulavi>{{ |
This metric was introduced in 2014 by by David Moulavi and colleagues in their work.<ref name = Moulavi>{{Citation |
||
| last = Moulavi |
| last = Moulavi |
||
| first = Davoud |
| first = Davoud |
||
| chapter = Density-Based Clustering Validation |
|||
| year = 2014 |
| year = 2014 |
||
⚫ | |||
| title = Density-based clustering validation |
|||
⚫ | |||
| doi = 10.1137/1.9781611973440.96 |
| doi = 10.1137/1.9781611973440.96 |
||
| pages = 839–847 |
| pages = 839–847 |
||
| publisher = SIAM |
| publisher = SIAM |
||
| isbn = 978-1-61197-344-0 |
|||
| url = https://www.dbs.ifi.lmu.de/~zimek/publications/SDM2014/DBCV.pdf |
| url = https://www.dbs.ifi.lmu.de/~zimek/publications/SDM2014/DBCV.pdf |
||
}}</ref> It utilizes density connectivity principles to quantify clustering structures, making it especially effective at detecting arbitrarily shaped clusters in concave datasets, where traditional metrics may be less reliable. |
}}</ref> It utilizes density connectivity principles to quantify clustering structures, making it especially effective at detecting arbitrarily shaped clusters in concave datasets, where traditional metrics may be less reliable. |
||
The DBCV index has been employed in bioinformatics analysis,<ref name="Di Giovanni">{{ |
The DBCV index has been employed in bioinformatics analysis,<ref name="Di Giovanni">{{Citation |
||
| last= Di Giovanni |
| last= Di Giovanni |
||
| first= Daniele |
| first= Daniele |
||
Line 31: | Line 32: | ||
| title= Using machine learning to explore shared genetic pathways and possible endophenotypes in autism spectrum disorder |
| title= Using machine learning to explore shared genetic pathways and possible endophenotypes in autism spectrum disorder |
||
| journal= Genes |
| journal= Genes |
||
| volume= 14 |
|||
| issue= 2 |
|||
| page= 313 |
|||
| doi = 10.3390/genes14020313 |
| doi = 10.3390/genes14020313 |
||
| doi-access= free |
|||
| url = https://www.mdpi.com/2073-4425/14/2/313 |
|||
| pmid= 36833240 |
|||
⚫ | |||
| pmc= 9956345 |
|||
⚫ | |||
| last= Poutaraud |
| last= Poutaraud |
||
| first= Joachim |
| first= Joachim |
||
Line 39: | Line 45: | ||
| title= Meta-Embedded Clustering (MEC): A new method for improving clustering quality in unlabeled bird sound datasets |
| title= Meta-Embedded Clustering (MEC): A new method for improving clustering quality in unlabeled bird sound datasets |
||
| journal = Ecological Informatics |
| journal = Ecological Informatics |
||
| volume= 82 |
|||
| pages = 102687 |
| pages = 102687 |
||
| publisher = Elsevier |
| publisher = Elsevier |
||
| doi = 10.1016/j.ecoinf.2024.102687 |
| doi = 10.1016/j.ecoinf.2024.102687 |
||
| url = https://www.sciencedirect.com/science/article/pii/S1574954124002292 |
| url = https://www.sciencedirect.com/science/article/pii/S1574954124002292 |
||
}}</ref> techno-economic analysis,<ref name="Shim">{{ |
}}</ref> techno-economic analysis,<ref name="Shim">{{Citation |
||
| last= Shim |
| last= Shim |
||
| first= Jaehyun |
| first= Jaehyun |
||
Line 49: | Line 56: | ||
| title= Techno-economic analysis of micro-grid system design through climate region clustering |
| title= Techno-economic analysis of micro-grid system design through climate region clustering |
||
| journal = Energy Conversion and Management |
| journal = Energy Conversion and Management |
||
| volume= 274 |
|||
| pages = 116411 |
| pages = 116411 |
||
| publisher = Elsevier |
| publisher = Elsevier |
||
| doi = 10.1016/j.enconman.2022.116411 |
| doi = 10.1016/j.enconman.2022.116411 |
||
| bibcode= 2022ECM...27416411S |
|||
| url = https://www.sciencedirect.com/science/article/abs/pii/S019689042201189X |
| url = https://www.sciencedirect.com/science/article/abs/pii/S019689042201189X |
||
}}</ref> and health informatics analysis<ref name="Martinez">{{ |
}}</ref> and health informatics analysis<ref name="Martinez">{{Citation |
||
| last= Martínez |
| last= Martínez |
||
| first= Rubén Yáñez |
| first= Rubén Yáñez |
||
Line 59: | Line 68: | ||
| title= Spanish Corpora of tweets about COVID-19 vaccination for automatic stance detection |
| title= Spanish Corpora of tweets about COVID-19 vaccination for automatic stance detection |
||
| journal = Information Processing \& Management |
| journal = Information Processing \& Management |
||
| volume= 60 |
|||
| issue= 3 |
|||
| pages = 103294 |
| pages = 103294 |
||
| publisher = Elsevier |
| publisher = Elsevier |
||
| doi = 10.1016/j.ipm.2023.103294 |
| doi = 10.1016/j.ipm.2023.103294 |
||
| url = https://www.sciencedirect.com/science/article/pii/S0306457323000316 |
| url = https://www.sciencedirect.com/science/article/pii/S0306457323000316 |
||
}}</ref> as well as in numerous other fields<ref name=Beer">{{ |
}}</ref> as well as in numerous other fields<ref name=Beer">{{Citation |
||
| last= Beer |
| last= Beer |
||
| first= Anna |
| first= Anna |
||
Line 69: | Line 80: | ||
| title= DISCO: Internal Evaluation of Density-Based Clustering |
| title= DISCO: Internal Evaluation of Density-Based Clustering |
||
| journal = arXiv preprint arXiv:2503.00127 |
| journal = arXiv preprint arXiv:2503.00127 |
||
| |
| arxiv = 2503.00127 |
||
| url = https://arxiv.org/abs/2503.00127 |
| url = https://arxiv.org/abs/2503.00127 |
||
}}</ref> |
}}</ref> |
||
<ref name="Veigel">{{ |
<ref name="Veigel">{{Citation |
||
| last= Veigel |
| last= Veigel |
||
| first= Nadja |
| first= Nadja |
||
Line 78: | Line 89: | ||
| title= Content analysis of multi-annual time series of flood-related Twitter (X) data |
| title= Content analysis of multi-annual time series of flood-related Twitter (X) data |
||
| journal = Natural Hazards and Earth System Sciences |
| journal = Natural Hazards and Earth System Sciences |
||
| volume= 25 |
|||
⚫ | |||
| issue= 2 |
|||
⚫ | |||
| publisher = Copernicus Publications Gottingen, Germany |
| publisher = Copernicus Publications Gottingen, Germany |
||
| doi = 10.5194/nhess-25-879-2025 |
| doi = 10.5194/nhess-25-879-2025 |
||
| doi-access= free |
|||
| bibcode= 2025NHESS..25..879V |
|||
| url = https://nhess.copernicus.org/articles/25/879/2025/ |
| url = https://nhess.copernicus.org/articles/25/879/2025/ |
||
}}</ref> |
}}</ref> |
||
Line 87: | Line 102: | ||
DBCV index evaluates clustering structures by analyzing the relationships between data points within and across clusters. Given a dataset <math>X = {x_1,x_2,...,x_n}</math>, a density-based algorithm partitions it into ''K '' clusters <math>{C_1,C_2,...,C_n}</math>. Each point belongs to a specific cluster, denoted as <math>Cluster(X_i)</math> |
DBCV index evaluates clustering structures by analyzing the relationships between data points within and across clusters. Given a dataset <math>X = {x_1,x_2,...,x_n}</math>, a density-based algorithm partitions it into ''K '' clusters <math>{C_1,C_2,...,C_n}</math>. Each point belongs to a specific cluster, denoted as <math>Cluster(X_i)</math> |
||
A key concept in DBCV index is the notion of density-connected paths.<ref>{{ |
A key concept in DBCV index is the notion of density-connected paths.<ref>{{Citation |
||
| last = Ester |
| last = Ester |
||
| first = M. |
| first = M. |
||
Line 93: | Line 108: | ||
| title = Density-based Clustering |
| title = Density-based Clustering |
||
| journal = Encyclopedia of Database Systems |
| journal = Encyclopedia of Database Systems |
||
| pages = 795–799 |
|||
| editor1-last = Liu |
| editor1-last = Liu |
||
| editor1-first = L. |
| editor1-first = L. |
||
Line 138: | Line 154: | ||
== Implementations == |
== Implementations == |
||
* Python DBCV Implementation by Christopher Jennes<ref>https://github.com/christopherjenness/DBCV</ref> |
* Python DBCV Implementation by Christopher Jennes<ref>{{cite web | url=https://github.com/christopherjenness/DBCV | title=Christopherjenness/DBCV | website=[[GitHub]] }}</ref> |
||
* Python DBCV Implementation by Felipe Silva<ref>https://github.com/FelSiq/DBCV</ref> |
* Python DBCV Implementation by Felipe Silva<ref>{{cite web | url=https://github.com/FelSiq/DBCV | title=FelSiq/DBCV | website=[[GitHub]] }}</ref> |
||
* R DBCV Implementation<ref>https://doi.org/10.32614/CRAN.package.DBCVindex</ref> |
* R DBCV Implementation<ref>{{cite web | url=https://doi.org/10.32614/CRAN.package.DBCVindex | doi=10.32614/CRAN.package.DBCVindex | title=DBCVindex: Calculates the Density-Based Clustering Validation (DBCV) Index | date=2024 | last1=Jaskowiak | first1=Pablo Andretta }}</ref> |
||
== See also == |
== See also == |
Revision as of 18:06, 14 April 2025
This article, Density-based clustering validation, has recently been created via the Articles for creation process. Please check to see if the reviewer has accidentally left this template after accepting the draft and take appropriate action as necessary.
Reviewer tools: Inform author |

Density-Based Clustering Validation (DBCV) is a metric designed to assess the quality of clustering solutions, particularly for density-based clustering algorithms like DBSCAN, Mean shift, and OPTICS. This metric is particularly suited for identifying concave and nested clusters, where traditional metrics such as the Silhouette coefficient, Davies–Bouldin index, or Calinski–Harabasz index often struggle to provide meaningful evaluations.
Unlike traditional validation measures, which often rely on compact and well-separated clusters, DBCV index evaluates how well clusters are defined in terms of local density variations and structural coherence.
This metric was introduced in 2014 by by David Moulavi and colleagues in their work.[2] It utilizes density connectivity principles to quantify clustering structures, making it especially effective at detecting arbitrarily shaped clusters in concave datasets, where traditional metrics may be less reliable.
The DBCV index has been employed in bioinformatics analysis,[3] ecology analysis,[4] techno-economic analysis,[5] and health informatics analysis[6] as well as in numerous other fields[7] [8]
Definition
DBCV index evaluates clustering structures by analyzing the relationships between data points within and across clusters. Given a dataset , a density-based algorithm partitions it into K clusters . Each point belongs to a specific cluster, denoted as
A key concept in DBCV index is the notion of density-connected paths.[9] Two points within the same cluster are considered density-connected if there exists a sequence of intermediate points linking them, where each consecutive pair meets a predefined density criterion. The density-based distance between two points is determined by identifying the optimal path that minimizes the maximum local reachability distance along its trajectory.
DBCV index extends the Silhouette coefficient by redefining cluster cohesion and separation using density-based distances:
- Within-cluster density distance measures how closely a point is related to other members of its cluster:
- Nearest-cluster density distance quantifies how far a point is from the closest external cluster:
Using these measures, the DBCV index is computed as:
Explanation
DBCV index values range between -1 and +1:
- +1: Strongly cohesive and well-separated clusters.
- 0: Ambiguous clustering structure.
- -1: Poorly formed clusters or incorrect assignments.
By leveraging density-based distances instead of traditional Euclidean measures, DBCV index provides a more robust evaluation of clustering performance in datasets with irregular or non-spherical distributions[2] .
Implementations
- Python DBCV Implementation by Christopher Jennes[10]
- Python DBCV Implementation by Felipe Silva[11]
- R DBCV Implementation[12]
See also
- Cluster analysis
- DBSCAN
- Silhouette coefficient
- Dunn index
- Calinski-Harabasz index
- Davies–Bouldin index
References
- ^ GitHub FelSiq/DBCV Fast Density-Based Clustering Validation (DBCV) Python package -- https://github.com/FelSiq/DBCV
- ^ a b Moulavi, Davoud (2014), "Density-Based Clustering Validation", Proceedings of the 2014 SIAM International Conference on Data Mining (PDF), SIAM, pp. 839–847, doi:10.1137/1.9781611973440.96, ISBN 978-1-61197-344-0
- ^ Di Giovanni, Daniele (2023), "Using machine learning to explore shared genetic pathways and possible endophenotypes in autism spectrum disorder", Genes, 14 (2): 313, doi:10.3390/genes14020313, PMC 9956345, PMID 36833240
- ^ Poutaraud, Joachim (2024), "Meta-Embedded Clustering (MEC): A new method for improving clustering quality in unlabeled bird sound datasets", Ecological Informatics, 82, Elsevier: 102687, doi:10.1016/j.ecoinf.2024.102687
- ^ Shim, Jaehyun (2022), "Techno-economic analysis of micro-grid system design through climate region clustering", Energy Conversion and Management, 274, Elsevier: 116411, Bibcode:2022ECM...27416411S, doi:10.1016/j.enconman.2022.116411
- ^ Martínez, Rubén Yáñez (2023), "Spanish Corpora of tweets about COVID-19 vaccination for automatic stance detection", Information Processing \& Management, 60 (3), Elsevier: 103294, doi:10.1016/j.ipm.2023.103294
- ^ Beer, Anna (2025), "DISCO: Internal Evaluation of Density-Based Clustering", arXiv preprint arXiv:2503.00127, arXiv:2503.00127
- ^ Veigel, Nadja (2025), "Content analysis of multi-annual time series of flood-related Twitter (X) data", Natural Hazards and Earth System Sciences, 25 (2), Copernicus Publications Gottingen, Germany: 879–891, Bibcode:2025NHESS..25..879V, doi:10.5194/nhess-25-879-2025
- ^ Ester, M. (2009), Liu, L.; Özsu, M.T. (eds.), "Density-based Clustering", Encyclopedia of Database Systems, Boston, MA: Springer: 795–799, doi:10.1007/978-0-387-39940-9_605, ISBN 978-0-387-35544-3
- ^ "Christopherjenness/DBCV". GitHub.
- ^ "FelSiq/DBCV". GitHub.
- ^ Jaskowiak, Pablo Andretta (2024). "DBCVindex: Calculates the Density-Based Clustering Validation (DBCV) Index". doi:10.32614/CRAN.package.DBCVindex.