Floating-Point Values
The floating-point types are float, double, and BigDecimal, which are conceptually associated with the single-precision 32-bit, double-precision 64-bit format IEEE 754 and arbitrary-precision signed decimal values and operations as specified in IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985 (IEEE, New York).
The IEEE 754 standard includes not only positive and negative numbers that consist of a sign and magnitude, but also positive and negative zeros, positive and negative infinities, and special Not-a-Number values (hereafter abbreviated NaN). A NaN value is used to represent the result of certain invalid operations such as dividing zero by zero. NaN constants of both float and double type are predefined as Float.NaN and Double.NaN.
The finite nonzero values of any floating-point value set can all be
expressed in the form
s - m - 2[e-N+1]
, where
s
is
+1
or
-1
,
m
is a positive integer less than
2N
, and
e
is an integer between
Emin = -(2K-1-2)
and
Emax = 2K-1-1
, inclusive, and
where
N
and
K
are parameters that depend on the value set. Some
values can be represented in this form in more than one way; for example,
suppose that a value
v
in a value set might be represented in this form using
certain values for
s
,
m
, and
e
, then if it happened that
m
were even and
e
were less than
2K-1
, one could halve
m
and increase
e
by
1
to produce a second representation for the same value
v
. A representation in this form is called
normalized if
m ≥ 2[N-1]
; otherwise the representation is
said to be
denormalized. If a value in a value set cannot be
represented in such a way that
m ≥ 2[N-1]
, then the value is said to be a
denormalized value, because it has no normalized representation.
The constraints on the parameters
N
and
K
(and on the derived parameters
Emin
and
Emax
) for the two floating-point value sets
are summarized in
Table - Floating-Point Limit Value Sets.
Parameter |
Float |
Double |
---|---|---|
N | 24 | 53 |
K | 8 | 11 |
Emax | +127 | +1023 |
Emin | -126 | -1022 |
Each of the two value sets includes not only the finite nonzero values that are ascribed to it above, but also NaN values and the four values positive zero, negative zero, positive infinity, and negative infinity. The elements of the float value set are exactly the values that can be represented using the single floating-point format defined in the IEEE 754 standard. The elements of the double value set are exactly the values that can be represented using the double floating-point format defined in the IEEE 754 standard.
Except for NaN, floating-point values are ordered; arranged from
smallest to largest, they are negative infinity, negative finite nonzero
values, positive and negative zero, positive finite nonzero values, and
positive infinity. Positive zero and negative zero compare equal; thus the
result of the expression
0.0==-0.0
is true and the result of
0.0>
-0.0 is false. But other operations can distinguish
positive and negative zero; for example,
1.0/0.0
has the value positive infinity, while the value
of
1.0/-0.0
is negative infinity.
NaN is
unordered, so the numerical comparison operators
<
,
<=
,
>
, and
>=
return
false
if either or both operands are NaN. The equality
operator
==
returns
false
if either operand is NaN, and the inequality
operator
!=
returns
true
if either operand is NaN. In particular,
x!=x
is
true
if and only if
x
is NaN, and
(x<y) == !(x>=y)
is
false
if
x
or
y
is NaN.
Any value of a floating-point type may be cast to or from any numeric
type and from the String type in which case an
ExpressionTargetExeption
with a nested
NumberFormatException is thrown back if the string value cannot be properly
parsed into a floating-point type. There are no casts between floating-point
types and the type Boolean.