calc: Floats
5.3 Floats
==========
A floating-point number or “float” is a number stored in scientific
notation. The number of significant digits in the fractional part is
governed by the current floating precision (Precision). The
range of acceptable values is from ‘10^-3999999’ (inclusive) to
‘10^4000000’ (exclusive), plus the corresponding negative values and
zero.
Calculations that would exceed the allowable range of values (such as
‘exp(exp(20))’) are left in symbolic form by Calc. The messages
“floating-point overflow” or “floating-point underflow” indicate that
during the calculation a number would have been produced that was too
large or too close to zero, respectively, to be represented by Calc.
This does not necessarily mean the final result would have overflowed,
just that an overflow occurred while computing the result. (In fact, it
could report an underflow even though the final result would have
overflowed!)
If a rational number and a float are mixed in a calculation, the
result will in general be expressed as a float. Commands that require
an integer value (such as ‘k g’ [‘gcd’]) will also accept integer-valued
floats, i.e., floating-point numbers with nothing after the decimal
point.
Floats are identified by the presence of a decimal point and/or an
exponent. In general a float consists of an optional sign, digits
including an optional decimal point, and an optional exponent consisting
of an ‘e’, an optional sign, and up to seven exponent digits. For
example, ‘23.5e-2’ is 23.5 times ten to the minus-second power, or
0.235.
Floating-point numbers are normally displayed in decimal notation
with all significant figures shown. Exceedingly large or small numbers
are displayed in scientific notation. Various other display options are
available. Float Formats.
Floating-point numbers are stored in decimal, not binary. The result
of each operation is rounded to the nearest value representable in the
number of significant digits specified by the current precision,
rounding away from zero in the case of a tie. Thus (in the default
display mode) what you see is exactly what you get. Some operations
such as square roots and transcendental functions are performed with
several digits of extra precision and then rounded down, in an effort to
make the final result accurate to the full requested precision.
However, accuracy is not rigorously guaranteed. If you suspect the
validity of a result, try doing the same calculation in a higher
precision. The Calculator’s arithmetic is not intended to be
IEEE-conformant in any way.
While floats are always _stored_ in decimal, they can be entered and
displayed in any radix just like integers and fractions. Since a float
that is entered in a radix other that 10 will be converted to decimal,
the number that Calc stores may not be exactly the number that was
entered, it will be the closest decimal approximation given the current
precision. The notation ‘RADIX#DDD.DDD’ is a floating-point number
whose digits are in the specified radix. Note that the ‘.’ is more
aptly referred to as a “radix point” than as a decimal point in this
case. The number ‘8#123.4567’ is defined as ‘8#1234567 * 8^-4’. If the
radix is 14 or less, you can use ‘e’ notation to write a non-decimal
number in scientific notation. The exponent is written in decimal, and
is considered to be a power of the radix: ‘8#1234567e-4’. If the radix
is 15 or above, the letter ‘e’ is a digit, so scientific notation must
be written out, e.g., ‘16#123.4567*16^2’. The first two exercises of
the Modes Tutorial explore some of the properties of non-decimal floats.