Floating-Point Operations

The Expression Language provides a number of operators that act on floating-point values:

  • The comparison operators, which result in a value of type Boolean:

    • Numerical Comparison Operators (<, <=, >, and >=).

    • Numeric Equality Operators (== and !=).

  • The numerical operators, which result in a value of type float or double or BigDecimal:

    • Unary Plus Operator (+) and Unary Minus Operator (-).

    • Multiplicative Operators (*, ?, and %).

    • Additive Operators (+ and -).

    • Postfix Increment Operator (++) and Prefix Increment Operator (++).

    • Postfix Decrement Operator (--) and Prefix Decrement Operator (--).

  • Conditional Operator (? :)

  • Field Access, using either a qualified name or a field access expression.

  • Method Invocation.

  • The cast operator, which can convert from a floating-point value to a value of any specified numeric type.

  • The string concatenation operator + , which, when given a String operand and a floating-point operand, will convert the floating-point operand to a String representing its value in decimal form (without information loss), and then produce a newly created String by concatenating the two strings.

  • The prompt concatenation operator +, which, when given a Prompt operand and a floating-point operand, will convert the floating-point operand to a Prompt representing its value in spoken form, and then produce a newly created Prompt that is the concatenation of the two prompt

Except for the prompt concatenation operator, these operations are the same as those in Java. For descriptions of the operations you can have on expressions, see: http://java.sun.com/docs/books/jls/second_edition/html/expressions.doc.html#44393.

Other useful constructors, methods, and constants are predefined in the classes Float, Double, BigDecimal, and Math.

If at least one of the operands to a binary operator is of floating-point type, then the operation is a floating-point operation, even if the other is integral.

If at least one of the operands to a numerical operator is of type BigDecimal, then the operation is carried out using arbitrary floating-point arithmetic, and the result of the numerical operator is a value of type BigDecimal. If the other operand is not a BigDecimal, it is first widened to type double by numeric promotion. If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. (If the other operand is not a double, it is first widened to type double by numeric promotion.) Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. If the other operand is not a float, it is first widened to type float by numeric promotion.

Operators on floating-point numbers behave as specified by IEEE 754 (with the exception of the remainder operator). The Expression Language requires support of IEEE 754 denormalized floating-point numbers and gradual underflow, which make it easier to prove the properties of some numerical algorithms. Floating-point operations do not "flush to zero" if the calculated result is a denormalized number.

The Java programming language requires that floating-point arithmetic behave as if every floating-point operator rounded its floating-point result to the result precision. Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.

The language uses round toward zero when converting a floating value to an integer, which acts, in this case, as though the number were truncated, discarding the mantissa bits. Rounding toward zero chooses as its result the format's value closest to and no greater in magnitude than the infinitely precise result.

Floating-point operators produce no exceptions. An operation that overflows produces a signed infinity, an operation that underflows produces a denormalized value or a signed zero, and an operation that has no mathematically definite result produces NaN. All numeric operations with NaN as an operand produce NaN as a result. As has already been described, NaN is unordered, so a numeric comparison operation involving one or two NaNs returns false and any != comparison involving NaN returns true, including x!=x when x is NaN.

The example expression:


				{
									// An example of overflow:
									double d = 1e308;

									System.out.print("overflow produces infinity: "); 
									System.out.println(d + "*10==" + d*10); 
								// An example of gradual underflow:
								d = 1e-305 * Math.PI;
								System.out.print("gradual underflow: " + d + "\n ");
								for (int i = 0; i < 4; i++) {
										System.out.print(" " + (d /= 100000));
								}
								System.out.println();
								// An example of NaN:
								System.out.print("0.0/0.0 is Not-a-Number: ");
								d = 0.0/0.0;
								System.out.println(d);
								// An example of inexact results and rounding:
								System.out.print("inexact results with float:");
								for (int i = 0; i < 100; i++) {
										float z = 1.0f / i;

										if (z * i != 1.0f) {
												System.out.print(" " + i);
										}
								}
								System.out.println();
								// Another example of inexact results and rounding:
								System.out.print("inexact results with double:");
								for (int i = 0; i < 100; i++) {
										double z = 1.0 / i;
										
										if (z * i != 1.0) {
												System.out.print(" " + i);
										}
								}
								System.out.println();
								// An example of cast to integer rounding:
								System.out.print("cast to int rounds toward 0: ");
								d = 12345.6;
								System.out.println((int)d + " " + (int)(-d));
								return null;

				}

produces the output:


									overflow produces infinity: 1.0e+308*10==Infinity
									gradual underflow: 3.141592653589793E-305
												3.1415926535898E-310 3.141592653E-315 3.142E-320 0.0
									0.0/0.0 is Not-a-Number: NaN
									inexact results with float: 0 41 47 55 61 82 83 94 97
									inexact results with double: 0 49 98
									cast to int rounds toward 0: 12345 -12345

This example demonstrates, among other things, that gradual underflow can result in a gradual loss of precision. The results when i is 0 involve division by zero, so that z becomes positive infinity, and z * 0 is NaN, which is not equal to 1.0.