User:NullPointerError/sandbox

This is the user sandbox of NullPointerError. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

The median trick is a generic approach that increases the chances of a probabilistic algorithm to succeed.^[1] Apparently first used in 1986^[2] by Jerrum et al.^[3] for approximate counting algorithms, the technique was later applied to a broad selection of classification and regression problems.^[2]

The idea of median trick is very simple: run the randomized algorithm with numeric output multiple times, and use the median of the obtained results as a final answer. For example, if an algorithm takes a set of data as input, and has sublinear runtime, then the same algorithm can be run repeatedly (or in parallel) over randomly sampled subsets of input data, and, per Chernoff inequality, the median of the results will converge to solution rapidly.^[4] Similarly, for the algorithms that are sublinear in space (e.g., counting the distinct elements of a stream), different randomizations of the algorithm (say, with different hash functions) may be used for repeated runs over the same data.^[5]

Statement

Given a set of independent random variables ${\textstyle X_{1},\dots ,X_{n}}$ , and an unknown deterministic number ${\textstyle Y}$ .

Suppose that each random variable ${\textstyle X_{i}}$ falls within ${\textstyle [Y\pm \epsilon ]}$ with probability ${\textstyle \geq p}$ where ${\textstyle p>1/2}$ is a constant, then the median trick states that ${\textstyle Med(X_{i})\in [Y\pm \epsilon ]}$ with probability ${\textstyle \geq 1-e^{-2n(p-1/2)^{2}}}$ .

In other words, in order to ensure that ${\textstyle Y\in [Med(X_{i})\pm \epsilon ]}$ with probability ${\textstyle \geq 1-\delta }$ , it suffices to use ${\textstyle {\frac {\ln {\frac {1}{\delta }}}{2(p-1/2)^{2}}}}$ samples.

Proof

Let ${\textstyle Z_{i}}$ be the indicator variable for the event that ${\textstyle X_{i}\in [Y\pm \epsilon ]}$ . Then, the event ${\textstyle Med(X_{i})\in [Y\pm \epsilon ]}$ fails to occur only if at least half of ${\textstyle Z_{i}=1}$ , that is, ${\textstyle {\frac {1}{n}}\sum _{i}Z_{i}\leq 1/2}$ .

By Hoeffding's inequality, this event occurs with probability ${\textstyle \leq e^{-2n(p-1/2)^{2}}}$ .

Application to decision problems

The median trick can also be adapted to amplify the success probability for decision problems, where a yes-no answer is required instead of a number. In this case, the results of multiple independent runs of a probabilistic algorithm are combined together using a majority vote.

The majority vote here is equivalent to taking the median of all numbers, by assigning the numbers 0 and 1 to the outputs no and yes. If the majority of voters respond with yes, the median is 1, and vice versa. In the case of a tie, the result can be chosen arbitrarily between yes or no.

References

^ Kogler & Traxler 2017, p. 378.
^ ^a ^b Kogler & Traxler 2017, p. 380.
^ Jerrum, Valiant & Vazirani 1986, p. 182, Lemma 6.1.
^ Wang & Han 2015, p. 11.
^ Wang & Han 2015, pp. 17–18, Median Trick in Boosting Confidence.

Sources

Kogler, Alexander; Traxler, Patrick (2017). "Parallel and Robust Empirical Risk Minimization via the Median Trick". Mathematical Aspects of Computer and Information Sciences. Cham: Springer International Publishing. doi:10.1007/978-3-319-72453-9_31. ISBN 978-3-319-72452-2. ISSN 0302-9743.
Jerrum, Mark R.; Valiant, Leslie G.; Vazirani, Vijay V. (1986). "Random generation of combinatorial structures from a uniform distribution". Theoretical Computer Science. 43. Elsevier BV: 169–188. doi:10.1016/0304-3975(86)90174-x. ISSN 0304-3975.
Wang, Dan; Han, Zhu (2015). "Basics for Sublinear Algorithms". Sublinear Algorithms for Big Data Applications. Cham: Springer International Publishing. doi:10.1007/978-3-319-20448-2_2. ISBN 978-3-319-20447-5. ISSN 2191-5768.

[FOOTNOTEKoglerTraxler2017378-1] Kogler & Traxler 2017, p. 378.

[FOOTNOTEKoglerTraxler2017380-2] Kogler & Traxler 2017, p. 380.

[FOOTNOTEJerrumValiantVazirani1986182Lemma_6.1-3] Jerrum, Valiant & Vazirani 1986, p. 182, Lemma 6.1.

[FOOTNOTEWangHan201511-4] Wang & Han 2015, p. 11.

[FOOTNOTEWangHan201517–18Median_Trick_in_Boosting_Confidence-5] Wang & Han 2015, pp. 17–18, Median Trick in Boosting Confidence.

[1]

[2]

[3]

[4]

[5]