Biweight midcorrelation

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information ^[1].

Derivation

Here we find the biweight midcorrelation of two vectors $x$ and $y$ , with $i=1,2,\ldots ,m$ items, representing each item in the vector as $x_{1},x_{2},\ldots ,x_{m}$ and $y_{1},y_{2},\ldots ,y_{m}$ . First, we define $\operatorname {med} (x)$ as the median of a vector $x$ and $\operatorname {mad} (x)$ as the median absolute deviation (MAD), then define $u_{i}$ and $v_{i}$ as,

{\begin{aligned}u_{i}&={\frac {x_{i}-\operatorname {med} (x)}{9\operatorname {mad} (x)}},\\v_{i}&={\frac {y_{i}-\operatorname {med} (y)}{9\operatorname {mad} (x)}}.\end{aligned}}

Now we define the weights $w_{i}^{(x)}$ and $w_{i}^{(y)}$ as,

{\begin{aligned}w_{i}^{(x)}&=\left(1-u_{i}^{2}\right)^{2}I\left(1-|u_{i}|\right)\\w_{i}^{(y)}&=\left(1-v_{i}^{2}\right)^{2}I\left(1-|v_{i}|\right)\end{aligned}}

[How is the function denoted capital I defined?]

Then we normalize so that the sum of the weights is 1:

{\begin{aligned}{\tilde {x}}_{i}&={\frac {\left(x_{i}-\operatorname {med} (x)\right)w_{i}^{(x)}}{\sum _{j=1}^{m}\left[(x_{j}-\operatorname {med} (x))w_{j}^{(x)}\right]^{2}}}\\{\tilde {y}}_{i}&={\frac {\left(y_{i}-\operatorname {med} (y)\right)w_{i}^{(y)}}{\sum _{j=1}^{m}\left[(y_{j}-\operatorname {med} (y))w_{j}^{(y)}\right]^{2}}}.\end{aligned}}

Finally, we define biweight midcorrelation as,

\mathrm {bicor} \left(x,y\right)=\sum _{i=1}^{m}{\tilde {x}}_{i}{\tilde {y}}_{i}

Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks^[2], and is often used for weighted correlation network analysis.

Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package^[3].

References

^ Wilcox, Rand (January 12, 2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press. p. 455. ISBN 978-0123869838.
^ Song, Lin (9 December 2012). "Comparison of co-expression measures: mutual information, correlation, and model based indices". BMC Bioinformatics. 13 (328). doi:0.1186/1471-2105-13-328. PMID 23217028. {{cite journal}}: |access-date= requires |url= (help); Check |doi= value (help)
^ Langfelder, Peter. "bicor {WGCNA}". Inside R. Revolution Analytics. Retrieved 18 August 2015.

[1] Wilcox, Rand (January 12, 2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press. p. 455. ISBN 978-0123869838.

[2] Song, Lin (9 December 2012). "Comparison of co-expression measures: mutual information, correlation, and model based indices". BMC Bioinformatics. 13 (328). doi:0.1186/1471-2105-13-328. PMID 23217028. {{cite journal}}: |access-date= requires |url= (help); Check |doi= value (help)

[3] Langfelder, Peter. "bicor {WGCNA}". Inside R. Revolution Analytics. Retrieved 18 August 2015.

[1]

[2]

[3]