Floating-Point Values

The floating-point types are float, double, and BigDecimal, which are conceptually associated with the single-precision 32-bit, double-precision 64-bit format IEEE 754 and arbitrary-precision signed decimal values and operations as specified in IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985 (IEEE, New York).

The IEEE 754 standard includes not only positive and negative numbers that consist of a sign and magnitude, but also positive and negative zeros, positive and negative infinities, and special Not-a-Number values (hereafter abbreviated NaN). A NaN value is used to represent the result of certain invalid operations such as dividing zero by zero. NaN constants of both float and double type are predefined as Float.NaN and Double.NaN.

The finite nonzero values of any floating-point value set can all be expressed in the form s - m - 2^[e-N+1] , where s is +1 or -1, m is a positive integer less than 2^N, and e is an integer between ^Emin = -(2^K-1-2) and ^Emax = 2^K-1-1, inclusive, and where N and K are parameters that depend on the value set. Some values can be represented in this form in more than one way; for example, suppose that a value v in a value set might be represented in this form using certain values for s, m, and e, then if it happened that m were even and e were less than 2^K-1 , one could halve m and increase e by 1 to produce a second representation for the same value v. A representation in this form is called normalized if m ≥ 2^[N-1]; otherwise the representation is said to be denormalized. If a value in a value set cannot be represented in such a way that m ≥ 2^[N-1], then the value is said to be a denormalized value, because it has no normalized representation.

The constraints on the parameters N and K (and on the derived parameters E_min and E_max) for the two floating-point value sets are summarized in Table - Floating-Point Limit Value Sets.

Table - Floating-Point Limit Value Sets
Parameter	Float	Double
N	24	53
K	8	11
E_max	+127	+1023
E_min	-126	-1022

Each of the two value sets includes not only the finite nonzero values that are ascribed to it above, but also NaN values and the four values positive zero, negative zero, positive infinity, and negative infinity. The elements of the float value set are exactly the values that can be represented using the single floating-point format defined in the IEEE 754 standard. The elements of the double value set are exactly the values that can be represented using the double floating-point format defined in the IEEE 754 standard.

Except for NaN, floating-point values are ordered; arranged from smallest to largest, they are negative infinity, negative finite nonzero values, positive and negative zero, positive finite nonzero values, and positive infinity. Positive zero and negative zero compare equal; thus the result of the expression 0.0==-0.0 is true and the result of 0.0>-0.0 is false. But other operations can distinguish positive and negative zero; for example, 1.0/0.0 has the value positive infinity, while the value of 1.0/-0.0 is negative infinity.

NaN is unordered, so the numerical comparison operators <, <=, >, and >= return false if either or both operands are NaN. The equality operator == returns false if either operand is NaN, and the inequality operator != returns true if either operand is NaN. In particular, x!=x is true if and only if x is NaN, and (x<y) == !(x>=y) is false if x or y is NaN.

Any value of a floating-point type may be cast to or from any numeric type and from the String type in which case an ExpressionTargetExeption with a nested NumberFormatException is thrown back if the string value cannot be properly parsed into a floating-point type. There are no casts between floating-point types and the type Boolean.