Jump to content

Root-finding algorithm: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Method based on the Budan-Fourier theorem: Introductory paragraph on what the methods are about, why one would consider using them.
a large number of obvious corrections required by WP:MOS and WP:MOSMATH
Line 3: Line 3:
This article is concerned with finding [[Scalar (mathematics)|scalar]], [[real number|real]] or [[complex number|complex]] roots, approximated as floating point numbers. Finding integer roots or exact algebraic roots are separate problems, whose algorithms have little in common with those discussed here. (See: [[Diophantine equation]] as for integer roots)
This article is concerned with finding [[Scalar (mathematics)|scalar]], [[real number|real]] or [[complex number|complex]] roots, approximated as floating point numbers. Finding integer roots or exact algebraic roots are separate problems, whose algorithms have little in common with those discussed here. (See: [[Diophantine equation]] as for integer roots)


Finding a root of ''f(x)'' − ''g(x)'' = 0 is the same as solving the [[equation]] ''f''(''x'') = ''g''(''x''). Here, ''x'' is called the ''unknown'' in the equation. Conversely, any equation can take the [[canonical form]] ''f''(''x'') = 0, so [[equation solving]] is the same thing as computing (or ''finding'') a root of a function.
Finding a root of ''f''(''x'') − ''g''(''x'') = 0 is the same as solving the [[equation]] ''f''(''x'') = ''g''(''x''). Here, ''x'' is called the ''unknown'' in the equation. Conversely, any equation can take the [[canonical form]] ''f''(''x'') = 0, so [[equation solving]] is the same thing as computing (or ''finding'') a root of a function.


Numerical root-finding methods use [[iteration]], producing a [[sequence]] of numbers that hopefully converge towards a [[Limit of a sequence|limit]] (the so-called "[[Fixed point (mathematics)|fixed point]]") which is a root. The first values of this series are ''initial guesses''. The method computes subsequent values based on the old ones and the function ''f''.
Numerical root-finding methods use [[iteration]], producing a [[sequence]] of numbers that hopefully converge towards a [[Limit of a sequence|limit]] (the so-called "[[Fixed point (mathematics)|fixed point]]") which is a root. The first values of this series are ''initial guesses''. The method computes subsequent values based on the old ones and the function ''f''.
Line 9: Line 9:
The behaviour of root-finding algorithms is studied in [[numerical analysis]]. Algorithms perform best when they take advantage of known characteristics of the given function. Thus an algorithm to find isolated real roots of a low-degree polynomial in one variable may bear little resemblance to an algorithm for complex roots of a "black-box" function which is not even known to be differentiable. Questions include ability to separate close roots, robustness in achieving reliable answers despite inevitable numerical errors, and rate of convergence.
The behaviour of root-finding algorithms is studied in [[numerical analysis]]. Algorithms perform best when they take advantage of known characteristics of the given function. Thus an algorithm to find isolated real roots of a low-degree polynomial in one variable may bear little resemblance to an algorithm for complex roots of a "black-box" function which is not even known to be differentiable. Questions include ability to separate close roots, robustness in achieving reliable answers despite inevitable numerical errors, and rate of convergence.


== Bracketing Methods ==
== Bracketing methods ==


=== Bisection Method ===
=== Bisection method ===
The simplest root-finding algorithm is the [[bisection method]]. It works when ''f'' is a [[continuous function]] and it requires previous knowledge of two initial guesses, ''a'' and ''b'', such that ''f''(''a'') and ''f''(''b'') have opposite signs. Although it is reliable, it converges slowly, gaining one [[bit]] of accuracy with each iteration.
The simplest root-finding algorithm is the [[bisection method]]. It works when ''f'' is a [[continuous function]] and it requires previous knowledge of two initial guesses, ''a'' and ''b'', such that ''f''(''a'') and ''f''(''b'') have opposite signs. Although it is reliable, it converges slowly, gaining one [[bit]] of accuracy with each iteration.


=== False Position (Regula Falsi) ===
=== False position (regula falsi) ===
The [[false position method]], also called the ''regula falsi'' method, is like the secant method. However, instead of retaining the last two points, it makes sure to keep one point on either side of the root. The false position method is faster than the bisection method and more robust than the secant method, but requires the two starting points to bracket the root. [[Ridders' method]] is a variant on the false-position method that also evaluates the function at the midpoint of the interval, giving faster convergence with similar robustness.
The [[false position method]], also called the ''regula falsi'' method, is like the secant method. However, instead of retaining the last two points, it makes sure to keep one point on either side of the root. The false position method is faster than the bisection method and more robust than the secant method, but requires the two starting points to bracket the root. [[Ridders' method]] is a variant on the false-position method that also evaluates the function at the midpoint of the interval, giving faster convergence with similar robustness.


== Open Methods ==
== Open methods ==


=== Newton's Method (and similar derivative-based methods) ===
=== Newton's method (and similar derivative-based methods) ===
[[Newton's method]] assumes the function ''f'' to have a continuous [[derivative]]. Newton's method may not converge if started too far away from a root. However, when it does converge, it is faster than the bisection method, and is usually quadratic. Newton's method is also important because it readily generalizes to higher-dimensional problems. Newton-like methods with higher orders of convergence are the [[Householder's method]]s. The first one after Newton's method is [[Halley's method]] with cubic order of convergence.
[[Newton's method]] assumes the function ''f'' to have a continuous [[derivative]]. Newton's method may not converge if started too far away from a root. However, when it does converge, it is faster than the bisection method, and is usually quadratic. Newton's method is also important because it readily generalizes to higher-dimensional problems. Newton-like methods with higher orders of convergence are the [[Householder's method]]s. The first one after Newton's method is [[Halley's method]] with cubic order of convergence.


=== Secant Method ===
=== Secant method ===
Replacing the derivative in Newton's method with a [[finite difference]], we get the [[secant method]]. This method does not require the computation (nor the existence) of a derivative, but the price is slower convergence (the order is approximately 1.6). A generalization of the secant method in higher dimensions is [[Broyden's method]].
Replacing the derivative in Newton's method with a [[finite difference]], we get the [[secant method]]. This method does not require the computation (nor the existence) of a derivative, but the price is slower convergence (the order is approximately 1.6). A generalization of the secant method in higher dimensions is [[Broyden's method]].


Line 30: Line 30:
[[Sidi's method]] allows for interpolation with an arbitrarily high [[Degree of a polynomial|degree]] polynomial. The higher the degree of the interpolating polynomial, the faster the convergence. Sidi's method allows for convergence with an order arbitrarily close to 2.
[[Sidi's method]] allows for interpolation with an arbitrarily high [[Degree of a polynomial|degree]] polynomial. The higher the degree of the interpolating polynomial, the faster the convergence. Sidi's method allows for convergence with an order arbitrarily close to 2.


=== Inverse Interpolation ===
=== Inverse interpolation ===
This can be avoided by interpolating the [[inverse function|inverse]] of ''f'', resulting in the [[inverse quadratic interpolation]] method. Again, convergence is asymptotically faster than the secant method, but inverse quadratic interpolation often behaves poorly when the iterates are not close to the root.
This can be avoided by interpolating the [[inverse function|inverse]] of ''f'', resulting in the [[inverse quadratic interpolation]] method. Again, convergence is asymptotically faster than the secant method, but inverse quadratic interpolation often behaves poorly when the iterates are not close to the root.


==Combinations of Methods==
==Combinations of methods==


=== Brent's Method ===
=== Brent's method ===
[[Brent's method]] is a combination of the bisection method, the secant method and [[inverse quadratic interpolation]]. At every iteration, Brent's method decides which method out of these three is likely to do best, and proceeds by doing a step according to that method. This gives a robust and fast method, which therefore enjoys considerable popularity.
[[Brent's method]] is a combination of the bisection method, the secant method and [[inverse quadratic interpolation]]. At every iteration, Brent's method decides which method out of these three is likely to do best, and proceeds by doing a step according to that method. This gives a robust and fast method, which therefore enjoys considerable popularity.


Line 50: Line 50:




The most simple method to find a single root fast is using Newton's method. One can use [[Horner's method]] twice to efficiently evaluate the value of the polynomial function and its first derivative, this combination is called [[Birge-Vieta's method]]. This method provides quadratic convergence for simple roots at the cost of two polynomial evaluations per step.
The most simple method to find a single root fast is using Newton's method. One can use [[Horner's method]] twice to efficiently evaluate the value of the polynomial function and its first derivative, this combination is called [[Birge–Vieta's method]]. This method provides quadratic convergence for simple roots at the cost of two polynomial evaluations per step.




Closely related to Newton's method are [[Halley's method]] and [[Laguerre's method]]. Using one additional Horner evaluation, the value of the second derivative is used to obtain methods of cubic convergence order for simple roots. If one starts from a point ''x'' close to a root and uses the same complexity of ''6'' function evaluations, these methods perform two steps with an residual of ''O''(|''f''(''x'')|<sup>''9''</sup>), compared to three steps of Newtons method with an reduction ''O''(|''f''(''x'')|<sup>''8''</sup>), giving a slight advantage to these methods.
Closely related to Newton's method are [[Halley's method]] and [[Laguerre's method]]. Using one additional Horner evaluation, the value of the second derivative is used to obtain methods of cubic convergence order for simple roots. If one starts from a point ''x'' close to a root and uses the same complexity of ''6'' function evaluations, these methods perform two steps with an residual of ''O''(|''f''(''x'')|<sup>9</sup>), compared to three steps of Newtons method with an reduction ''O''(|''f''(''x'')|<sup>8</sup>), giving a slight advantage to these methods.


When applying these methods to polynomials with real coefficients and real starting points, Newton's and Halley's method stay inside the real number line. One has to chose complex starting points to find complex roots. In contrast, the Laguerre method with a square root in its evaluation will on itself leave the real axis.
When applying these methods to polynomials with real coefficients and real starting points, Newton's and Halley's method stay inside the real number line. One has to chose complex starting points to find complex roots. In contrast, the Laguerre method with a square root in its evaluation will on itself leave the real axis.




Another class of methods is based on translating the problem of finding polynomial roots to the problem of finding eigenvaluesof the [[companion matrix]] of the polynomial. In principle, one can use any [[eigenvalue algorithm]] to find the roots of the polynomial. However, for efficiency reasons one prefers methods that employ the structure of the matrix, that is, can be implemented in matrix-free form. Among these methods are the [[power method]], whose application to the transpose of the companion matrix is the classical [[Bernoulli's method]] to find the root of greatest modulus. The [[inverse power method]] with shifts, which finds some smallest root first, is what drives the complex (''cpoly'') variant of the [[Jenkins–Traub method]] and gives it its numerical stability. Additionally, it is insensitive to multiple roots and has fast convergence with order <math>1+\Phi\approx 2.6</math> even in the presence of clustered roots. This fast convergence comes with a cost of 3 Horner evaluations per step resulting in a residual of ''O''(|''f''(''x'')|<sup>''2+3&Phi;''</sup>), which is slower than 3 steps of Newton's method.
Another class of methods is based on translating the problem of finding polynomial roots to the problem of finding eigenvaluesof the [[companion matrix]] of the polynomial. In principle, one can use any [[eigenvalue algorithm]] to find the roots of the polynomial. However, for efficiency reasons one prefers methods that employ the structure of the matrix, that is, can be implemented in matrix-free form. Among these methods are the [[power method]], whose application to the transpose of the companion matrix is the classical [[Bernoulli's method]] to find the root of greatest modulus. The [[inverse power method]] with shifts, which finds some smallest root first, is what drives the complex (''cpoly'') variant of the [[Jenkins–Traub method]] and gives it its numerical stability. Additionally, it is insensitive to multiple roots and has fast convergence with order <math>1+\Phi\approx 2.6</math> even in the presence of clustered roots. This fast convergence comes with a cost of 3 Horner evaluations per step resulting in a residual of ''O''(|''f''(''x'')|<sup>2+3&Phi;</sup>), which is slower than three steps of Newton's method.



===Finding roots in pairs===
===Finding roots in pairs===


If the given polynomial only has real coefficients, one may wish to avoid computations with real numbers. To that effect, one has to find quadratic factors for pairs of conjugate complex roots. The application of the multi-dimensional Newton's method to this task results in [[Bairstow's method]]. In the framework of inverse power iterations of the companion matrix, the double shift method of Francis results in the real (''rpoly'')variant of the [[Jenkins-Traub method]].
If the given polynomial only has real coefficients, one may wish to avoid computations with real numbers. To that effect, one has to find quadratic factors for pairs of conjugate complex roots. The application of the multi-dimensional Newton's method to this task results in [[Bairstow's method]]. In the framework of inverse power iterations of the companion matrix, the double shift method of Francis results in the real (''rpoly'')variant of the [[Jenkins–Traub method]].


===Finding all roots at once===
===Finding all roots at once===
Line 72: Line 71:




===Exclusion and Enclosure methods===
===Exclusion and enclosure methods===


Several fast tests exist that tell if a segment of the real line or a region of the complex plane contains no roots. By bounding the modulus of the roots and recursively subdividing the initial region indicated by these bounds, one can isolate small regions that may contain roots and then apply other methods to locate them exactly.
Several fast tests exist that tell if a segment of the real line or a region of the complex plane contains no roots. By bounding the modulus of the roots and recursively subdividing the initial region indicated by these bounds, one can isolate small regions that may contain roots and then apply other methods to locate them exactly.
Line 78: Line 77:
All these methods require to find the coefficients of shifted and scaled versions of the polynomial. For large degrees, FFT-based accelerated methods become viable.
All these methods require to find the coefficients of shifted and scaled versions of the polynomial. For large degrees, FFT-based accelerated methods become viable.


For real roots, [[Sturm's theorem]] and [[Descartes' rule of signs]] with its extension in the Budan-Fourier theorem provide guides to locating and separating roots. This plus [[interval arithmetic]] combined with [[Newton's method]] yields robust and fast algorithms.
For real roots, [[Sturm's theorem]] and [[Descartes' rule of signs]] with its extension in the Budan–Fourier theorem provide guides to locating and separating roots. This plus [[interval arithmetic]] combined with [[Newton's method]] yields robust and fast algorithms.


The [[Lehmer-Schur algorithm]] uses the Schur-Cohn test for circles, Wilf's global bisection algorithm uses a winding number computation for rectangular regions in the complex plane.
The [[Lehmer–Schur algorithm]] uses the Schur–Cohn test for circles, Wilf's global bisection algorithm uses a winding number computation for rectangular regions in the complex plane.


The [[splitting circle method]] uses FFT-based polynomial transformations to find large-degree factors corresponding to clusters of roots. The precision of the factorization is maximized using a Newton-type iteration. This method is useful for finding the roots of polynomials of high degree to arbitrary precision; it has almost optimal complexity in this setting.
The [[splitting circle method]] uses FFT-based polynomial transformations to find large-degree factors corresponding to clusters of roots. The precision of the factorization is maximized using a Newton-type iteration. This method is useful for finding the roots of polynomials of high degree to arbitrary precision; it has almost optimal complexity in this setting.


===Method based on the Budan-Fourier theorem or Sturm chains===
===Method based on the Budan–Fourier theorem or Sturm chains===


The methods in this class give for polynomials with rational coefficients, and when carried out in rational arithmetic, provably complete enclosures of all real roots by rational intervals. The test of an interval for real roots using Budans theorem is computationally simple but may yield false positive results. However, these will be reliably detected in the following transformation and refinement of the interval. The test based on Sturm chains is computationally more involved but gives always the exact number of real roots in the interval.
The methods in this class give for polynomials with rational coefficients, and when carried out in rational arithmetic, provably complete enclosures of all real roots by rational intervals. The test of an interval for real roots using Budans theorem is computationally simple but may yield false positive results. However, these will be reliably detected in the following transformation and refinement of the interval. The test based on Sturm chains is computationally more involved but gives always the exact number of real roots in the interval.


The algorithm for isolating the roots, using Descartes' rule of signs and [[Budan's theorem#Vincent's theorem (1834 and 1836)|Vincent's theorem]], had been originally called ''modified Uspensky's algorithm'' by its inventors Collins and Akritas.<ref name=CA>{{cite book|last=Collins|first=George E.|coauthor=Alkiviadis G. Akritas|title =Polynomial Real Root Isolation Using Descartes' Rule of Signs|year = 1976|pages=272–275|series = SYMSAC '76, Proceedings of the third ACM symposium on Symbolic and algebraic computation|publisher = ACM|location = Yorktown Heights, NY, USA|url=http://doi.acm.org/10.1145/800205.806346}}</ref> After going through names like "Collins-Akritas method" and "Descartes' method" (too confusing if ones considers Fourier's article<ref name=Fourier>{{cite journal|last=Fourier|first=Jean Baptiste Joseph|title=Sur l'usage du théorème de Descartes dans la recherche des limites des racines|year=1820|journal=Bulletin des Sciences, par la Société Philomatique de Paris|pages=156–165|url=http://ia600309.us.archive.org/22/items/bulletindesscien20soci/bulletindesscien20soci.pdf}}</ref>), it was finally François Boulier, of Lille University, who gave it the name ''[[Vincent's theorem#Vincent–Collins–Akritas (VCA, 1976)|Vincent-Collins-Akritas]]'' (VCA) method,<ref>{{cite book|last=Boulier|first=François|title=Systèmes polynomiaux : que signifie " résoudre " ?|url=http://www.lifl.fr/~boulier/polycopies/resoudre.pdf|year=2010|publisher=Université Lille 1}}</ref> p.&nbsp;24, based on the fact that "Uspensky's method" does not exist<ref name="akritas">{{cite book|last=Akritas|first=Alkiviadis G.|title=There's no "Uspensky's Method"|url=http://dl.acm.org/citation.cfm?id=32457|year=1986|publisher=In: Proceedings of the fifth ACM Symposium on Symbolic and Algebraic Computation (SYMSAC '86, Waterloo, Ontario, Canada), pp. 88–90}}</ref> and neither does "Descartes' method".<ref>{{cite book|last=Akritas|first=Alkiviadis G.|title=There is no "Descartes' method"|url=http://inf-server.inf.uth.gr/~akritas/articles/71.pdf|year=2008|publisher=In: M.J.Wester and M. Beaudin (Eds), Computer Algebra in Education, AullonaPress, USA, pp. 19-35}}</ref> This algorithm has been improved by Rouillier and Zimmerman,<ref name=RZ>F. Rouillier and P. Zimmerman, ''Efficient isolation of polynomial's real roots'', Journal of Computational and Applied Mathematics '''162''' (2004)</ref> and the resulting implementation is, to the date, the fastest bisection method. It has the same worst case [[Computational complexity theory|complexity]] as Sturm algorithm, but is almost always much faster. It is the default algorithm of [[Maple (software)|Maple]] root-finding function ''fsolve''. Another method based on [[Budan's theorem#Vincent's theorem (1834 and 1836)|Vincent's theorem]] is the ''[[Vincent's theorem#Vincent–Akritas–Strzeboński (VAS, 2005)|Vincent–Akritas–Strzeboński]]'' (VAS) method;<ref name=VAS>{{cite journal|last=Akritas|first=Alkiviadis G.|coauthors=A.W. Strzeboński, P.S. Vigklas|title=Improving the performance of the continued fractions method using new bounds of positive roots|journal=Nonlinear Analysis: Modelling and Control|year=2008|volume=13|pages=265–279|url=http://www.lana.lt/journal/30/Akritas.pdf}}</ref> it has been shown<ref name=AS>{{cite journal|last=Akritas|first=Alkiviadis G.|coauthors=Adam W. Strzeboński|title=A Comparative Study of Two Real Root Isolation Methods|journal=Nonlinear Analysis: Modelling and Control|year=2005|volume=10|number=4|pages=297–304|url=http://www.lana.lt/journal/19/Akritas.pdf}}</ref> that the VAS (continued fractions) method is faster than the fastest implementation of the VCA (bisection) method,<ref name="RZ"/> a fact that was independently confirmed elsewhere;<ref name=TE>{{cite journal|last=Tsigaridas, P.E.|coauthors=I.Z. Emiris,|title=Univariate polynomial real root isolation: Continued fractions revisited|journal=LNCS|year=2006|volume=4168|pages=817–828|url=http://www.springerlink.com/content/c70468755x403481/}}</ref> more precisely, for the Mignotte polynomials of high degree VAS is about 50,000 times faster than the fastest implementation of VCA. VAS is the default algorithm for root isolation in [[Mathematica]], [[Sage (mathematics software)|Sage]], [[SymPy]], [[Xcas]]. See [[Budan's theorem]] for a description of the historical background of these methods. For a comparison between Sturm's method and VAS use the functions realroot(poly) and time(realroot(poly)) of [[Xcas]]. By default, to isolate the real roots of poly realroot uses the VAS method; to use Sturm's method write realroot(sturm, poly). See also the [[Root-finding algorithm#External links|External links]] for a pointer to an iPhone/iPod/iPad application that does the same thing.
The algorithm for isolating the roots, using Descartes' rule of signs and [[Budan's theorem#Vincent's theorem (1834 and 1836)|Vincent's theorem]], had been originally called ''modified Uspensky's algorithm'' by its inventors Collins and Akritas.<ref name=CA>{{cite book|last=Collins|first=George E.|coauthor=Alkiviadis G. Akritas|title =Polynomial Real Root Isolation Using Descartes' Rule of Signs|year = 1976|pages=272–275|series = SYMSAC '76, Proceedings of the third ACM symposium on Symbolic and algebraic computation|publisher = ACM|location = Yorktown Heights, NY, USA|url=http://doi.acm.org/10.1145/800205.806346}}</ref> After going through names like "Collins–Akritas method" and "Descartes' method" (too confusing if ones considers Fourier's article<ref name=Fourier>{{cite journal|last=Fourier|first=Jean Baptiste Joseph|title=Sur l'usage du théorème de Descartes dans la recherche des limites des racines|year=1820|journal=Bulletin des Sciences, par la Société Philomatique de Paris|pages=156–165|url=http://ia600309.us.archive.org/22/items/bulletindesscien20soci/bulletindesscien20soci.pdf}}</ref>), it was finally François Boulier, of Lille University, who gave it the name ''[[Vincent's theorem#Vincent–Collins–Akritas (VCA, 1976)|Vincent–Collins–Akritas]]'' (VCA) method,<ref>{{cite book|last=Boulier|first=François|title=Systèmes polynomiaux : que signifie " résoudre " ?|url=http://www.lifl.fr/~boulier/polycopies/resoudre.pdf|year=2010|publisher=Université Lille 1}}</ref> p.&nbsp;24, based on the fact that "Uspensky's method" does not exist<ref name="akritas">{{cite book|last=Akritas|first=Alkiviadis G.|title=There's no "Uspensky's Method"|url=http://dl.acm.org/citation.cfm?id=32457|year=1986|publisher=In: Proceedings of the fifth ACM Symposium on Symbolic and Algebraic Computation (SYMSAC '86, Waterloo, Ontario, Canada), pp. 88–90}}</ref> and neither does "Descartes' method".<ref>{{cite book|last=Akritas|first=Alkiviadis G.|title=There is no "Descartes' method"|url=http://inf-server.inf.uth.gr/~akritas/articles/71.pdf|year=2008|publisher=In: M.J.Wester and M. Beaudin (Eds), Computer Algebra in Education, AullonaPress, USA, pp. 19–35}}</ref> This algorithm has been improved by Rouillier and Zimmerman,<ref name=RZ>F. Rouillier and P. Zimmerman, ''Efficient isolation of polynomial's real roots'', Journal of Computational and Applied Mathematics '''162''' (2004)</ref> and the resulting implementation is, to the date, the fastest bisection method. It has the same worst case [[Computational complexity theory|complexity]] as Sturm algorithm, but is almost always much faster. It is the default algorithm of [[Maple (software)|Maple]] root-finding function ''fsolve''. Another method based on [[Budan's theorem#Vincent's theorem (1834 and 1836)|Vincent's theorem]] is the ''[[Vincent's theorem#Vincent–Akritas–Strzeboński (VAS, 2005)|Vincent–Akritas–Strzeboński]]'' (VAS) method;<ref name=VAS>{{cite journal|last=Akritas|first=Alkiviadis G.|coauthors=A.W. Strzeboński, P.S. Vigklas|title=Improving the performance of the continued fractions method using new bounds of positive roots|journal=Nonlinear Analysis: Modelling and Control|year=2008|volume=13|pages=265–279|url=http://www.lana.lt/journal/30/Akritas.pdf}}</ref> it has been shown<ref name=AS>{{cite journal|last=Akritas|first=Alkiviadis G.|coauthors=Adam W. Strzeboński|title=A Comparative Study of Two Real Root Isolation Methods|journal=Nonlinear Analysis: Modelling and Control|year=2005|volume=10|number=4|pages=297–304|url=http://www.lana.lt/journal/19/Akritas.pdf}}</ref> that the VAS (continued fractions) method is faster than the fastest implementation of the VCA (bisection) method,<ref name="RZ"/> a fact that was independently confirmed elsewhere;<ref name=TE>{{cite journal|last=Tsigaridas, P.E.|coauthors=I.Z. Emiris,|title=Univariate polynomial real root isolation: Continued fractions revisited|journal=LNCS|year=2006|volume=4168|pages=817–828|url=http://www.springerlink.com/content/c70468755x403481/}}</ref> more precisely, for the Mignotte polynomials of high degree VAS is about 50,000 times faster than the fastest implementation of VCA. VAS is the default algorithm for root isolation in [[Mathematica]], [[Sage (mathematics software)|Sage]], [[SymPy]], [[Xcas]]. See [[Budan's theorem]] for a description of the historical background of these methods. For a comparison between Sturm's method and VAS use the functions realroot(poly) and time(realroot(poly)) of [[Xcas]]. By default, to isolate the real roots of poly realroot uses the VAS method; to use Sturm's method write realroot(sturm, poly). See also the [[Root-finding algorithm#External links|External links]] for a pointer to an iPhone/iPod/iPad application that does the same thing.


===Finding multiple roots of polynomials===
===Finding multiple roots of polynomials===

Revision as of 20:22, 3 January 2014

A root-finding algorithm is a numerical method, or algorithm, for finding a value x such that f(x) = 0, for a given function f. Such an x is called a root of the function f.

This article is concerned with finding scalar, real or complex roots, approximated as floating point numbers. Finding integer roots or exact algebraic roots are separate problems, whose algorithms have little in common with those discussed here. (See: Diophantine equation as for integer roots)

Finding a root of f(x) − g(x) = 0 is the same as solving the equation f(x) = g(x). Here, x is called the unknown in the equation. Conversely, any equation can take the canonical form f(x) = 0, so equation solving is the same thing as computing (or finding) a root of a function.

Numerical root-finding methods use iteration, producing a sequence of numbers that hopefully converge towards a limit (the so-called "fixed point") which is a root. The first values of this series are initial guesses. The method computes subsequent values based on the old ones and the function f.

The behaviour of root-finding algorithms is studied in numerical analysis. Algorithms perform best when they take advantage of known characteristics of the given function. Thus an algorithm to find isolated real roots of a low-degree polynomial in one variable may bear little resemblance to an algorithm for complex roots of a "black-box" function which is not even known to be differentiable. Questions include ability to separate close roots, robustness in achieving reliable answers despite inevitable numerical errors, and rate of convergence.

Bracketing methods

Bisection method

The simplest root-finding algorithm is the bisection method. It works when f is a continuous function and it requires previous knowledge of two initial guesses, a and b, such that f(a) and f(b) have opposite signs. Although it is reliable, it converges slowly, gaining one bit of accuracy with each iteration.

False position (regula falsi)

The false position method, also called the regula falsi method, is like the secant method. However, instead of retaining the last two points, it makes sure to keep one point on either side of the root. The false position method is faster than the bisection method and more robust than the secant method, but requires the two starting points to bracket the root. Ridders' method is a variant on the false-position method that also evaluates the function at the midpoint of the interval, giving faster convergence with similar robustness.

Open methods

Newton's method (and similar derivative-based methods)

Newton's method assumes the function f to have a continuous derivative. Newton's method may not converge if started too far away from a root. However, when it does converge, it is faster than the bisection method, and is usually quadratic. Newton's method is also important because it readily generalizes to higher-dimensional problems. Newton-like methods with higher orders of convergence are the Householder's methods. The first one after Newton's method is Halley's method with cubic order of convergence.

Secant method

Replacing the derivative in Newton's method with a finite difference, we get the secant method. This method does not require the computation (nor the existence) of a derivative, but the price is slower convergence (the order is approximately 1.6). A generalization of the secant method in higher dimensions is Broyden's method.

Interpolation

The secant method also arises if one approximates the unknown function f by linear interpolation. When quadratic interpolation is used instead, one arrives at Muller's method. It converges faster than the secant method. A particular feature of this method is that the iterates xn may become complex.

Sidi's method allows for interpolation with an arbitrarily high degree polynomial. The higher the degree of the interpolating polynomial, the faster the convergence. Sidi's method allows for convergence with an order arbitrarily close to 2.

Inverse interpolation

This can be avoided by interpolating the inverse of f, resulting in the inverse quadratic interpolation method. Again, convergence is asymptotically faster than the secant method, but inverse quadratic interpolation often behaves poorly when the iterates are not close to the root.

Combinations of methods

Brent's method

Brent's method is a combination of the bisection method, the secant method and inverse quadratic interpolation. At every iteration, Brent's method decides which method out of these three is likely to do best, and proceeds by doing a step according to that method. This gives a robust and fast method, which therefore enjoys considerable popularity.

Finding roots of polynomials

Much attention has been given to the special case that the function f is a polynomial; there exist root-finding algorithms exploiting the polynomial nature of f. For a univariate polynomial of degree less than five, there are closed form solutions such as the quadratic formula which produce all roots. However, even this degree-two solution should be used with care to ensure numerical stability. Even more care must be taken with the degree-three and degree-four solutions because of their complexity. Higher-degree polynomials have no such general solution, according to the Abel–Ruffini theorem (1824, 1799).

Finding one root at a time

The general idea is to find a root of the polynomial and then apply Horner's method to remove the corresponding factor according to the Ruffini rule.

This iterative scheme is numerically unstable, the approximation errors accumulate during the successive factorizations, so that the last roots are determined with a polynomial that deviates widely from a factor of the original polynomial. To reduce this error it is advisable to find the roots in increasing order of magnitude.

Wilkinson's polynomial illustrates that high precision may be necessary when computing the roots of a polynomial given its coefficients: the problem of finding the roots from the coefficients is in general ill-conditioned.


The most simple method to find a single root fast is using Newton's method. One can use Horner's method twice to efficiently evaluate the value of the polynomial function and its first derivative, this combination is called Birge–Vieta's method. This method provides quadratic convergence for simple roots at the cost of two polynomial evaluations per step.


Closely related to Newton's method are Halley's method and Laguerre's method. Using one additional Horner evaluation, the value of the second derivative is used to obtain methods of cubic convergence order for simple roots. If one starts from a point x close to a root and uses the same complexity of 6 function evaluations, these methods perform two steps with an residual of O(|f(x)|9), compared to three steps of Newtons method with an reduction O(|f(x)|8), giving a slight advantage to these methods.

When applying these methods to polynomials with real coefficients and real starting points, Newton's and Halley's method stay inside the real number line. One has to chose complex starting points to find complex roots. In contrast, the Laguerre method with a square root in its evaluation will on itself leave the real axis.


Another class of methods is based on translating the problem of finding polynomial roots to the problem of finding eigenvaluesof the companion matrix of the polynomial. In principle, one can use any eigenvalue algorithm to find the roots of the polynomial. However, for efficiency reasons one prefers methods that employ the structure of the matrix, that is, can be implemented in matrix-free form. Among these methods are the power method, whose application to the transpose of the companion matrix is the classical Bernoulli's method to find the root of greatest modulus. The inverse power method with shifts, which finds some smallest root first, is what drives the complex (cpoly) variant of the Jenkins–Traub method and gives it its numerical stability. Additionally, it is insensitive to multiple roots and has fast convergence with order even in the presence of clustered roots. This fast convergence comes with a cost of 3 Horner evaluations per step resulting in a residual of O(|f(x)|2+3Φ), which is slower than three steps of Newton's method.

Finding roots in pairs

If the given polynomial only has real coefficients, one may wish to avoid computations with real numbers. To that effect, one has to find quadratic factors for pairs of conjugate complex roots. The application of the multi-dimensional Newton's method to this task results in Bairstow's method. In the framework of inverse power iterations of the companion matrix, the double shift method of Francis results in the real (rpoly)variant of the Jenkins–Traub method.

Finding all roots at once

The simple Durand–Kerner and the slightly more complicated Aberth method simultaneously find all of the roots using only simple complex number arithmetic. Accelerated algorithms for multi-point evaluation and interpolation similar to the Fast Fourier Transform can help speed them up for large degrees of the polynomial. It is advisable to chose an unsymmetric, but evenly distributed set of initial points.

Another method with this style is the Dandelin–Gräffe method (sometimes also falsely ascribed to Lobachevsky) which uses polynomial transformations to repeatedly and implicitely square the roots. This greatly magnifies variances in the roots. Applying Viete's formulas, one obtains easy approximations for the modulus of the roots, and with some more effort, for the roots themselves.


Exclusion and enclosure methods

Several fast tests exist that tell if a segment of the real line or a region of the complex plane contains no roots. By bounding the modulus of the roots and recursively subdividing the initial region indicated by these bounds, one can isolate small regions that may contain roots and then apply other methods to locate them exactly.

All these methods require to find the coefficients of shifted and scaled versions of the polynomial. For large degrees, FFT-based accelerated methods become viable.

For real roots, Sturm's theorem and Descartes' rule of signs with its extension in the Budan–Fourier theorem provide guides to locating and separating roots. This plus interval arithmetic combined with Newton's method yields robust and fast algorithms.

The Lehmer–Schur algorithm uses the Schur–Cohn test for circles, Wilf's global bisection algorithm uses a winding number computation for rectangular regions in the complex plane.

The splitting circle method uses FFT-based polynomial transformations to find large-degree factors corresponding to clusters of roots. The precision of the factorization is maximized using a Newton-type iteration. This method is useful for finding the roots of polynomials of high degree to arbitrary precision; it has almost optimal complexity in this setting.

Method based on the Budan–Fourier theorem or Sturm chains

The methods in this class give for polynomials with rational coefficients, and when carried out in rational arithmetic, provably complete enclosures of all real roots by rational intervals. The test of an interval for real roots using Budans theorem is computationally simple but may yield false positive results. However, these will be reliably detected in the following transformation and refinement of the interval. The test based on Sturm chains is computationally more involved but gives always the exact number of real roots in the interval.

The algorithm for isolating the roots, using Descartes' rule of signs and Vincent's theorem, had been originally called modified Uspensky's algorithm by its inventors Collins and Akritas.[1] After going through names like "Collins–Akritas method" and "Descartes' method" (too confusing if ones considers Fourier's article[2]), it was finally François Boulier, of Lille University, who gave it the name Vincent–Collins–Akritas (VCA) method,[3] p. 24, based on the fact that "Uspensky's method" does not exist[4] and neither does "Descartes' method".[5] This algorithm has been improved by Rouillier and Zimmerman,[6] and the resulting implementation is, to the date, the fastest bisection method. It has the same worst case complexity as Sturm algorithm, but is almost always much faster. It is the default algorithm of Maple root-finding function fsolve. Another method based on Vincent's theorem is the Vincent–Akritas–Strzeboński (VAS) method;[7] it has been shown[8] that the VAS (continued fractions) method is faster than the fastest implementation of the VCA (bisection) method,[6] a fact that was independently confirmed elsewhere;[9] more precisely, for the Mignotte polynomials of high degree VAS is about 50,000 times faster than the fastest implementation of VCA. VAS is the default algorithm for root isolation in Mathematica, Sage, SymPy, Xcas. See Budan's theorem for a description of the historical background of these methods. For a comparison between Sturm's method and VAS use the functions realroot(poly) and time(realroot(poly)) of Xcas. By default, to isolate the real roots of poly realroot uses the VAS method; to use Sturm's method write realroot(sturm, poly). See also the External links for a pointer to an iPhone/iPod/iPad application that does the same thing.

Finding multiple roots of polynomials

Most root-finding algorithms behave badly when there are multiple roots or very close roots. However, for polynomials whose coefficients are exactly given as integers or rational numbers, there is an efficient method to factorize them into factors that have only simple roots and whose coefficients are also exactly given. This method, called square-free factorization, is based on the fact that the multiple roots of a polynomial are the roots of the greatest common divisor of the polynomial and its derivative.

The square-free factorization of a polynomial p is a factorization where each is either 1 or a polynomial without multiple root, and two different do not have any common root.

An efficient method to compute this factorization is Yun's algorithm.

See also

References

  1. ^ Collins, George E. (1976). Polynomial Real Root Isolation Using Descartes' Rule of Signs. SYMSAC '76, Proceedings of the third ACM symposium on Symbolic and algebraic computation. Yorktown Heights, NY, USA: ACM. pp. 272–275. {{cite book}}: Unknown parameter |coauthor= ignored (|author= suggested) (help)
  2. ^ Fourier, Jean Baptiste Joseph (1820). "Sur l'usage du théorème de Descartes dans la recherche des limites des racines" (PDF). Bulletin des Sciences, par la Société Philomatique de Paris: 156–165.
  3. ^ Boulier, François (2010). Systèmes polynomiaux : que signifie " résoudre " ? (PDF). Université Lille 1.
  4. ^ Akritas, Alkiviadis G. (1986). There's no "Uspensky's Method". In: Proceedings of the fifth ACM Symposium on Symbolic and Algebraic Computation (SYMSAC '86, Waterloo, Ontario, Canada), pp. 88–90.
  5. ^ Akritas, Alkiviadis G. (2008). There is no "Descartes' method" (PDF). In: M.J.Wester and M. Beaudin (Eds), Computer Algebra in Education, AullonaPress, USA, pp. 19–35.
  6. ^ a b F. Rouillier and P. Zimmerman, Efficient isolation of polynomial's real roots, Journal of Computational and Applied Mathematics 162 (2004)
  7. ^ Akritas, Alkiviadis G. (2008). "Improving the performance of the continued fractions method using new bounds of positive roots" (PDF). Nonlinear Analysis: Modelling and Control. 13: 265–279. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  8. ^ Akritas, Alkiviadis G. (2005). "A Comparative Study of Two Real Root Isolation Methods" (PDF). Nonlinear Analysis: Modelling and Control. 10 (4): 297–304. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  9. ^ Tsigaridas, P.E. (2006). "Univariate polynomial real root isolation: Continued fractions revisited". LNCS. 4168: 817–828. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: extra punctuation (link)