Yates's correction for continuity

In statistics, Yates's correction for continuity (or Yates's chi-squared test) is a statistical test commonly used when analyzing count data organized in a contingency table, particularly when sample sizes are small. It is specifically designed for testing whether two categorical variables are related or independent of each other. The correction modifies the standard chi-squared test to account for the fact that a continuous distribution (chi-squared) is used to approximate discrete data. Almost exclusively applied to 2×2 contingency tables, it involves subtracting 0.5 from the absolute difference between observed and expected frequencies before squaring the result.

Unlike the standard Pearson chi-squared statistic, Yates's correction is approximately unbiased for small sample sizes. It is considered more conservative than the uncorrected chi-squared test, as it increases the p-value and thus reduces the likelihood of rejecting the null hypothesis when it is true. While widely taught in introductory statistics courses, modern computational methods like Fisher's exact test may be preferred for analyzing small samples in 2×2 tables, with Yates's correction serving as a middle ground between uncorrected chi-squared tests and Fisher's exact test.

The correction was first published by Frank Yates in 1934.^[1]

Correction for approximation error

Using the chi-squared distribution to interpret Pearson's chi-squared statistic requires one to assume that the discrete probability of observed binomial frequencies in the table can be approximated by the continuous chi-squared distribution. This assumption is not quite correct, and introduces some error.

To reduce the error in approximation, Frank Yates, an English statistician, suggested a correction for continuity that adjusts the formula for Pearson's chi-squared test by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table.^[1] This reduces the chi-squared value obtained and thus increases its p-value.

The effect of Yates's correction is to prevent overestimation of statistical significance for small data. This formula is chiefly used when at least one cell of the table has an expected count smaller than 5.

\sum _{i=1}^{N}O_{i}=20\,

The following is Yates's corrected version of Pearson's chi-squared statistics:

\chi _{\text{Yates}}^{2}=\sum _{i=1}^{N}{(|O_{i}-E_{i}|-0.5)^{2} \over E_{i}}

where:

O_i = an observed frequency

E_i = an expected (theoretical) frequency, asserted by the null hypothesis

N = number of distinct events

2 × 2 table

As a short-cut, for a 2 × 2 table with the following entries:

	S	F
A	a	b	a+b
B	c	d	c+d
	a+c	b+d	N

\chi _{\text{Yates}}^{2}={\frac {N(|ad-bc|-N/2)^{2}}{(a+b)(c+d)(a+c)(b+d)}}.

In some cases, this is better.

\chi _{\text{Yates}}^{2}={\frac {N(\max(0,|ad-bc|-N/2))^{2}}{N_{S}N_{F}N_{A}N_{B}}}.

Yates's correction should always be applied, as it will tend to improve the accuracy of the p-value obtained.^{[citation needed]} However, in situations with large sample sizes, using the correction will have little effect on the value of the test statistic, and hence the p-value.

References

^ ^a ^b Yates, F (1934). "Contingency table involving small numbers and the χ² test". Supplement to the Journal of the Royal Statistical Society 1(2): 217–235. JSTOR 2983604

[Yates-1] Yates, F (1934). "Contingency table involving small numbers and the χ² test". Supplement to the Journal of the Royal Statistical Society 1(2): 217–235. JSTOR 2983604

[1]

Correction for approximation error

2 × 2 table

See also

References