Misplaced Pages

Sterbenz lemma

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Exact floating-point subtraction theorem

In floating-point arithmetic, the Sterbenz lemma or Sterbenz's lemma is a theorem giving conditions under which floating-point differences are computed exactly. It is named after Pat H. Sterbenz, who published a variant of it in 1974.

Sterbenz lemma — In a floating-point number system with subnormal numbers, if x {\displaystyle x} and y {\displaystyle y} are floating-point numbers such that

y 2 x 2 y , {\displaystyle {\frac {y}{2}}\leq x\leq 2y,}

then x y {\displaystyle x-y} is also a floating-point number. Thus, a correctly rounded floating-point subtraction

x y = fl ( x y ) = x y {\displaystyle x\ominus y=\operatorname {fl} (x-y)=x-y}

is computed exactly.

The Sterbenz lemma applies to IEEE 754, the most widely used floating-point number system in computers.

Proof

Let β {\displaystyle \beta } be the radix of the floating-point system and p {\displaystyle p} the precision.

Consider several easy cases first:

  • If x {\displaystyle x} is zero then x y = y {\displaystyle x-y=-y} , and if y {\displaystyle y} is zero then x y = x {\displaystyle x-y=x} , so the result is trivial because floating-point negation is always exact.
  • If x = y {\displaystyle x=y} the result is zero and thus exact.
  • If x < 0 {\displaystyle x<0} then we must also have y / 2 x < 0 {\displaystyle y/2\leq x<0} so y < 0 {\displaystyle y<0} . In this case, x y = ( x y ) {\displaystyle x-y=-(-x--y)} , so the result follows from the theorem restricted to x , y 0 {\displaystyle x,y\geq 0} .
  • If x y {\displaystyle x\leq y} , we can write x y = ( y x ) {\displaystyle x-y=-(y-x)} with x / 2 y 2 x {\displaystyle x/2\leq y\leq 2x} , so the result follows from the theorem restricted to x y {\displaystyle x\geq y} .

For the rest of the proof, assume 0 < y < x 2 y {\displaystyle 0<y<x\leq 2y} without loss of generality.

Write x , y > 0 {\displaystyle x,y>0} in terms of their positive integral significands s x , s y β p 1 {\displaystyle s_{x},s_{y}\leq \beta ^{p}-1} and minimal exponents e x , e y {\displaystyle e_{x},e_{y}} :

x = s x β e x p + 1 y = s y β e y p + 1 {\displaystyle {\begin{aligned}x&=s_{x}\cdot \beta ^{e_{x}-p+1}\\y&=s_{y}\cdot \beta ^{e_{y}-p+1}\end{aligned}}}

Note that x {\displaystyle x} and y {\displaystyle y} may be subnormal—we do not assume s x , s y β p 1 {\displaystyle s_{x},s_{y}\geq \beta ^{p-1}} .

The subtraction gives:

x y = s x β e x p + 1 s y β e y p + 1 = s x β e x e y β e y p + 1 s y β e y p + 1 = ( s x β e x e y s y ) β e y p + 1 . {\displaystyle {\begin{aligned}x-y&=s_{x}\cdot \beta ^{e_{x}-p+1}-s_{y}\cdot \beta ^{e_{y}-p+1}\\&=s_{x}\beta ^{e_{x}-e_{y}}\cdot \beta ^{e_{y}-p+1}-s_{y}\cdot \beta ^{e_{y}-p+1}\\&=(s_{x}\beta ^{e_{x}-e_{y}}-s_{y})\cdot \beta ^{e_{y}-p+1}.\end{aligned}}}

Let s = s x β e x e y s y {\displaystyle s'=s_{x}\beta ^{e_{x}-e_{y}}-s_{y}} . Since 0 < y < x {\displaystyle 0<y<x} we have:

  • e y e x {\displaystyle e_{y}\leq e_{x}} , so e x e y 0 {\displaystyle e_{x}-e_{y}\geq 0} , from which we can conclude β e x e y {\displaystyle \beta ^{e_{x}-e_{y}}} is an integer and therefore so is s = s x β e x e y s y {\displaystyle s'=s_{x}\beta ^{e_{x}-e_{y}}-s_{y}} ; and
  • x y > 0 {\displaystyle x-y>0} , so s > 0 {\displaystyle s'>0} .

Further, since x 2 y {\displaystyle x\leq 2y} , we have x y y {\displaystyle x-y\leq y} , so that

s β e y p + 1 = x y y = s y β e y p + 1 {\displaystyle s'\cdot \beta ^{e_{y}-p+1}=x-y\leq y=s_{y}\cdot \beta ^{e_{y}-p+1}}

which implies that

0 < s s y β p 1. {\displaystyle 0<s'\leq s_{y}\leq \beta ^{p}-1.}

Hence

x y = s β e y p + 1 , for 0 < s β p 1 , {\displaystyle x-y=s'\cdot \beta ^{e_{y}-p+1},\quad {\text{for}}\quad 0<s'\leq \beta ^{p}-1,}

so x y {\displaystyle x-y} is a floating-point number. ∎

Note: Even if x {\displaystyle x} and y {\displaystyle y} are normal, i.e., s x , s y β p 1 {\displaystyle s_{x},s_{y}\geq \beta ^{p-1}} , we cannot prove that s β p 1 {\displaystyle s'\geq \beta ^{p-1}} and therefore cannot prove that x y {\displaystyle x-y} is also normal. For example, the difference of the two smallest positive normal floating-point numbers x = ( β p 1 + 1 ) β e m i n p + 1 {\displaystyle x=(\beta ^{p-1}+1)\cdot \beta ^{e_{\mathrm {min} }-p+1}} and y = β p 1 β e m i n p + 1 {\displaystyle y=\beta ^{p-1}\cdot \beta ^{e_{\mathrm {min} }-p+1}} is x y = 1 β e m i n p + 1 {\displaystyle x-y=1\cdot \beta ^{e_{\mathrm {min} }-p+1}} which is necessarily subnormal. In floating-point number systems without subnormal numbers, such as CPUs in nonstandard flush-to-zero mode instead of the standard gradual underflow, the Sterbenz lemma does not apply.

Relation to catastrophic cancellation

The Sterbenz lemma may be contrasted with the phenomenon of catastrophic cancellation:

  • The Sterbenz lemma asserts that if x {\displaystyle x} and y {\displaystyle y} are sufficiently close floating-point numbers then their difference x y {\displaystyle x-y} is computed exactly by floating-point arithmetic x y = fl ( x y ) {\displaystyle x\ominus y=\operatorname {fl} (x-y)} , with no rounding needed.
  • The phenomenon of catastrophic cancellation is that if x ~ {\displaystyle {\tilde {x}}} and y ~ {\displaystyle {\tilde {y}}} are approximations to true numbers x {\displaystyle x} and y {\displaystyle y} —whether the approximations arise from prior rounding error or from series truncation or from physical uncertainty or anything else—the error of the difference x ~ y ~ {\displaystyle {\tilde {x}}-{\tilde {y}}} from the desired difference x y {\displaystyle x-y} is inversely proportional to x y {\displaystyle x-y} . Thus, the closer x {\displaystyle x} and y {\displaystyle y} are, the worse x ~ y ~ {\displaystyle {\tilde {x}}-{\tilde {y}}} may be as an approximation to x y {\displaystyle x-y} , even if the subtraction itself is computed exactly.

In other words, the Sterbenz lemma shows that subtracting nearby floating-point numbers is exact, but if the numbers one has are approximations then even their exact difference may be far off from the difference of numbers one wanted to subtract.

Use in numerical analysis

The Sterbenz lemma is instrumental in proving theorems on error bounds in numerical analysis of floating-point algorithms. For example, Heron's formula A = s ( s a ) ( s b ) ( s c ) {\displaystyle A={\sqrt {s(s-a)(s-b)(s-c)}}} for the area of triangle with side lengths a {\displaystyle a} , b {\displaystyle b} , and c {\displaystyle c} , where s = ( a + b + c ) / 2 {\displaystyle s=(a+b+c)/2} is the semi-perimeter, may give poor accuracy for long narrow triangles if evaluated directly in floating-point arithmetic. However, for a b c {\displaystyle a\geq b\geq c} , the alternative formula A = 1 4 ( a + ( b + c ) ) ( c ( a b ) ) ( c + ( a b ) ) ( a + ( b c ) ) {\displaystyle A={\frac {1}{4}}{\sqrt {{\bigl (}a+(b+c){\bigr )}{\bigl (}c-(a-b){\bigr )}{\bigl (}c+(a-b){\bigr )}{\bigl (}a+(b-c){\bigr )}}}} can be proven, with the help of the Sterbenz lemma, to have low forward error for all inputs.

References

  1. Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Gewerbestrasse 11, 6330 Cham, Switzerland: Birkhäuser. Lemma 4.1, p. 101. doi:10.1007/978-3-319-76526-6. ISBN 978-3-319-76525-9.{{cite book}}: CS1 maint: location (link)
  2. Sterbenz, Pat H. (1974). Floating-Point Computation. Englewood Cliffs, NJ, United States: Prentice-Hall. Theorem 4.3.1 and Corollary, p. 138. ISBN 0-13-322495-3.
  3. Kahan, W. (2014-09-04). "Miscalculating Area and Angles of a Needle-like Triangle" (PDF). Lecture Notes for Introductory Numerical Analysis Classes. Retrieved 2020-09-17.
  4. Goldberg, David (March 1991). "What every computer scientist should know about floating-point arithmetic". ACM Computing Surveys. 23 (1). New York, NY, United States: Association for Computing Machinery: 5–48. doi:10.1145/103162.103163. ISSN 0360-0300. S2CID 222008826. Retrieved 2020-09-17.
  5. Boldo, Sylvie (April 2013). Nannarelli, Alberto; Seidel, Peter-Michael; Tang, Ping Tak Peter (eds.). How to Compute the Area of a Triangle: a Formal Revisit. 21st IEEE Symposium on Computer Arithmetic. IEEE Computer Society. pp. 91–98. doi:10.1109/ARITH.2013.29. ISBN 978-0-7695-4957-6. ISSN 1063-6889.
Categories: