Float Manager
 

 < Home   < Developers   < Development Support   < Documentation

33 Float Manager


 Table of Contents  |  < Previous  |  Next >  |  Index
   
   

Title -
Palm OS® Programmer's API Reference

Part II: System Management

33 Float Manager

Float Manager Data Structures

FlpCompDouble

FlpDoubleBits

Float Manager Functions

FlpAToF

FlpBase10Info

FlpBufferAToF

FlpBufferCorrectedAdd

FlpBufferCorrectedSub

FlpCorrectedAdd

FlpCorrectedSub

FlpFToA

FlpGetExponent

FlpGetSign

FlpIsZero

FlpNegate

FlpSetNegative

FlpSetPositive

FlpVersion

       

This chapter provides reference material for the Float Manager API as follows:

Float Manager Data Structures

Float Manager Functions

The Float Manager API is declared in the header file FloatMgr.h. For more information on the Float Manager, see the section "Floating-Point" in the Palm OS Programmer's Companion, vol. I.

Float Manager Data Structures

FlpCompDouble

Float Manager functions accept and require values of type FlpDouble. The FlpCompDouble union allows you to declare values that can be interpreted either as a double or as an FlpDouble. As well, this union contains fields that provide easy access to the component parts of the double-precision floating-point number.

typedef union { 
  double d; 
  FlpDouble fd; 
  UInt32 ul[2]; 
  FlpDoubleBits fdb; 
} FlpCompDouble 
  

Field Descriptions

d
Provides access to the value as a double.
fd
Provides access to the value as a FlpDouble, which can be passed to or received from many Float Manager functions.
ul
Provides access to the value as two long integers.
fdb
Provides access to specific fields.

FlpDoubleBits

This structure provides direct access to the component parts of an IEEE-754 double-precision floating-point number. Use the FlpCompDouble union to convert numbers of type double to and from FlpDoubleBits.

typedef struct { 
  UInt32 sign : 1; 
  Int32 exp : 11; 
  UInt32 manH : 20; 
  UInt32 manL; 
} FlpDoubleBits 

Field Descriptions

sign
The sign bit. You can also use the FlpGetSign macro to obtain the sign bit, and the FlpNegate, FlpSetNegative, and FlpSetPositive macros to set the sign bit.
exp
The bits that make up the exponent. You can also use the FlpGetExponent macro to obtain the exponent value.
manH
The most-significant 20 bits of the mantissa.
manL
The least-significant 32 bits of the mantissa.

Float Manager Functions

FlpAToF

Purpose

Convert a null-terminated ASCII string to a 64-bit floating-point number. The string must have the format:

[+|-][digits][.][digits][e|E[+|-][digits]] 

Prototype

FlpDouble FlpAToF (const Char *s)

Parameters

-> sPointer to the string to be converted.

Result

Returns the value of the string as a floating-point number.

Comment

The mantissa of the number is limited to 32 bits.

This function is close to ANSI C library compatible. ANSI requires the form:

[+|-]digits[.][digits][(e|E)[+|-]digits]

In order to maintain backward compatibility with the Palm OS 1.0, this function considers all of the "digits" sections to be optional. Here's a table showing the ANSI and Palm OS behavior with some sample strings:
String
ANSI
> Palm OS 1.0
Palm OS 1.01
Notes
"+"
invalid
+0
+0
ANSI requires at least one digit.
".3"
invalid
0.3
0.3
ANSI requires a leading digit.
"0.3e123"
0.3e123
0.3e123
0.3e12
Palm OS 1.0 only allows 1 or 2 digit exponent.
"+1"
1
1
+0
Palm OS 1.0 doesn't allow a leading '+' sign.
"1e+2"
1e2
1e2
1
Palm OS 1.0 doesn't allow a '+' sign in the exponent.
"0.3E3"
0.3e3
0.3e3
0.3
Palm OS 1.0 doesn't allow a capital 'E' to mark the exponent.
"4294967297"
4294967297
4294967297
1
Palm OS 1.0 uses unsigned long and wraps around.
1.
Using the old Float Manager documented in Appendix C, "1.0 Float Manager."

Compatibility

Implemented only if 2.0 New Feature Set is present. GCC users must use FlpBufferAToF instead of this function.

See Also

FlpFToA

FlpBase10Info

Purpose

Extract detailed information on the base 10 form of a floating-point number: the base 10 mantissa, exponent, and sign.

Prototype

Err FlpBase10Info (FlpDouble a, UInt32 *mantissaP, Int16 *exponentP, Int16 *signP)

Parameters

-> aThe floating-point number.
<- mantissaPThe base 10 mantissa.
<- exponentPThe base 10 exponent.
<- signPThe sign: 1 if the number is negative, 0 otherwise.

Result

Returns 0 if no error, or flpErrOutOfRange if the supplied floating-point number is either not a number (NaN) or is infinite.

Comments

The mantissa is normalized so it contains at least 8 significant digits when printed as an integer value.

Compatibility

Implemented only if 2.0 New Feature Set is present.

See Also

FlpGetExponent, FlpGetSign

FlpBufferAToF

Purpose

Convert a null-terminated ASCII string to a floating-point number. The string must be in the format: [-]x[.]yyyyyyyy[e[-]zz]

Prototype

void FlpBufferAToF (FlpDouble *result, const Char *s)

Parameters

<- resultPointer to the structure into which the return value is placed.
-> sPointer to the null-terminated ASCII string to be converted.

Result

Returns the value of the string as a floating-point number.

Comments

See FlpAToF for a complete description of this function.

Compatibility

Implemented only if 2.0 New Feature Set is present. GCC users must use this function instead of FlpAToF due to incompatibilities in the way GCC handles structure return values. CodeWarrior users can use either function; they are binary compatible.

FlpBufferCorrectedAdd

Purpose

Adds two floating-point numbers and corrects for least-significant-bit errors when the result should be zero but is instead very close to zero.

Prototype

void FlpBufferCorrectedAdd (FlpDouble *result, FlpDouble firstOperand, FlpDouble secondOperand, Int16 howAccurate)

Parameters

<- resultPointer to the structure into which the return value is placed.
-> firstOperandThe first of the two numbers to be added.
-> secondOperandThe second of the two numbers to be added.
-> howAccurateThe smallest difference in exponents that won't force the result to zero. The value returned from this function is forced to zero if the difference between exponents in the smaller of the two operands and the result exceeds this value. Supply a value of zero for this parameter to obtain the default level of accuracy (which is equivalent to a howAccurate value of 48).

Result

Returns the calculated result.

Comments

See FlpCorrectedAdd for a complete description of this function.

Compatibility

Implemented only if 2.0 New Feature Set is present. GCC users must use this function instead of FlpCorrectedAdd due to incompatibilities in the way GCC handles structure return values. CodeWarrior users can use either function; they are binary compatible.

FlpBufferCorrectedSub

Purpose

Subtracts two floating-point numbers and corrects for least-significant-bit errors when the result should be zero but is instead very close to zero.

Prototype

void FlpBufferCorrectedSub (FlpDouble *result, FlpDouble firstOperand, FlpDouble secondOperand, Int16 howAccurate)

Parameters

<- resultPointer to the structure into which the return value is placed.
-> firstOperandThe value from which secondOperand is to be subtracted.
-> secondOperandThe value to subtract from firstOperand.
-> howAccurateThe smallest difference in exponents that won't force the result to zero. The value returned from this function is forced to zero if the difference between exponents in the smaller of the two operands and the result exceeds this value. Supply a value of zero for this parameter to obtain the default level of accuracy (which is equivalent to a howAccurate value of 48).

Result

Returns the calculated result.

Comments

See FlpCorrectedSub for a complete description of this function.

Compatibility

Implemented only if 2.0 New Feature Set is present. GCC users must use this function instead of FlpCorrectedSub due to incompatibilities in the way GCC handles structure return values. CodeWarrior users can use either function; they are binary compatible.

FlpCorrectedAdd

Purpose

Adds two floating-point numbers and corrects for least-significant-bit errors when the result should be zero but is instead very close to zero.

Prototype

FlpDouble FlpCorrectedAdd (FlpDouble firstOperand, FlpDouble secondOperand, Int16 howAccurate)

Parameters

-> firstOperandThe first of the two numbers to be added.
-> secondOperandThe second of the two numbers to be added.
-> howAccurateThe smallest difference in exponents that won't force the result to zero. The value returned from FlpCorrectedAdd is forced to zero if, when the exponent of the result of the addition is subtracted from the exponent of the smaller of the two operands, the difference exceeds the value specified for howAccurate. Supply a value of zero for this parameter to obtain the default level of accuracy (which is equivalent to a howAccurate value of 48).

Result

Returns the calculated result.

Comments

Adding or subtracting a large number and a small number produces a result similar in magnitude to the larger number. Adding or subtracting two numbers that are similar in magnitude can, depending on their signs, produce a result with a very small exponent (that is, a negative exponent that is large in magnitude). If the difference between the result's exponent and that of the operands is close to the number of significant bits expressible by the mantissa, it is quite possible that the result should in fact be zero.

There also exist cases where it may be useful to retain accuracy in the low-order bits of the mantissa. For instance: 99999999 + 0.00000001 - 99999999. However, unless the fractional part is an exact (negative) power of two, it is doubtful that what few bits of mantissa that are available will be enough to properly represent the fractional value. In this example, the 99999999 requires 26 bits, leaving 26 bits for the .00000001; this guarantees inaccuracy after the subtraction.

The problem arises from the difficulty in representing decimal fractions such as 0.1 in binary. After about three successive additions or subtractions, errors begin to appear in the least significant bits of the mantissa. If the value represented by the most significant bits of the mantissa is then subtracted away, the least significant bit error is normalized and becomes the actual result-when in fact the result should be zero.

This problem is only an issue for addition and subtraction.

Compatibility

Implemented only if 2.0 New Feature Set is present. GCC users must use FlpBufferCorrectedAdd instead of this function.

See Also

FlpCorrectedSub

FlpCorrectedSub

Purpose

Subtracts two floating-point numbers and corrects for least-significant-bit errors when the result should be zero but is instead very close to zero.

Prototype

FlpDouble FlpCorrectedSub (FlpDouble firstOperand, FlpDouble secondOperand, Int16 howAccurate)

Parameters

-> firstOperandThe value from which secondOperand is to be subtracted.
-> secondOperandThe value to subtract from firstOperand.
-> howAccurateThe smallest difference in exponents that won't force the result to zero.The value returned from FlpCorrectedSub is forced to zero if, when the exponent of the result of the subtraction is subtracted from the exponent of the smaller of the two operands, the difference exceeds the value specified for howAccurate. Supply a value of zero for this parameter to obtain the default level of accuracy (which is equivalent to a howAccurate value of 48).

Result

Returns the calculated result.

Comments

See the comments for FlpCorrectedAdd.

Compatibility

Implemented only if 2.0 New Feature Set is present. GCC users must use FlpBufferCorrectedSub instead of this function.

FlpFToA

Purpose

Convert a floating-point number to a null-terminated ASCII string in exponential format: [-]x.yyyyyyye[-]zz

Prototype

Err FlpFToA (FlpDouble a, Char *s)

Parameters

-> aFloating-point number.
<-sPointer to buffer to contain the ASCII string.

Result

Returns 0 if no error, or flpErrOutOfRange if the supplied value is infinite or is not a number. In this case, the buffer is set to the string "INF", "-INF", or "NaN" as appropriate.

Compatibility

Implemented only if 2.0 New Feature Set is present.

See Also

FlpAToF

FlpGetExponent

Purpose

Macro that returns the exponent of a 64-bit floating-point value. The returned value has the bias applied, so it ranges from -1023 to +1024.

Prototype

FlpGetExponent (x)

Parameters

-> xThe value from which the exponent is to be extracted.

Result

Returns a UInt32 containing the exponent of the specified value.

Compatibility

Implemented only if 2.0 New Feature Set is present.

See Also

FlpBase10Info, FlpGetSign

FlpGetSign

Purpose

Macro that returns the sign of a 64-bit floating-point value.

Prototype

FlpGetSign (x)

Parameters

-> xThe value from which the sign bit is to be extracted.

Result

Returns a UInt32 with a nonzero value if the specified value is negative, and with a zero value if it is positive.

Compatibility

Implemented only if 2.0 New Feature Set is present.

See Also

FlpBase10Info, FlpGetExponent, FlpNegate, FlpSetNegative, FlpSetPositive

FlpIsZero

Purpose

Macro that returns whether the specified 64-bit floating-point value is zero.

Prototype

FlpIsZero (x)

Parameters

-> xThe value for which the sign bit is desired.

Result

Returns a UInt32 with a nonzero value if the specified value is zero, and with a zero value if the specified value is other than zero.

Compatibility

Implemented only if 2.0 New Feature Set is present.

FlpNegate

Purpose

Macro that changes the sign bit of a 64-bit floating-point number.

Prototype

FlpNegate (x)

Parameters

-> xThe value in which the sign bit is to be changed.

Result

Returns a 64-bit floating-point value which is the negative of the value specified by x.

Compatibility

Implemented only if 2.0 New Feature Set is present.

See Also

FlpGetSign, FlpSetNegative, FlpSetPositive

FlpSetNegative

Purpose

Macro that ensures that a 64-bit floating-point number is negative.

Prototype

FlpSetNegative (x)

Parameters

-> xThe value that is to be forced negative.

Result

If the supplied 64-bit floating-point value is negative, that value is returned unchanged. If the supplied value is positive, the negative of that value is returned.

Compatibility

Implemented only if 2.0 New Feature Set is present.

See Also

FlpGetSign, FlpNegate, FlpSetPositive

FlpSetPositive

Purpose

Macro that ensures that a 64-bit floating-point number is positive.

Prototype

FlpSetPositive (x)

Parameters

-> xThe value that is to be forced positive.

Result

If the supplied 64-bit floating-point value is positive, that value is returned unchanged. If the supplied value is negative, its absolute value is returned.

Compatibility

Implemented only if 2.0 New Feature Set is present.

See Also

FlpGetSign, FlpNegate, FlpSetNegative

FlpVersion

Purpose

Returns the version number of the Float Manager.

Prototype

UInt32 FlpVersion (void)

Parameters

None.

Result

Returns the version number of the Float Manager. The current version is represented by the constant flpVersion, which is defined in FloatMgr.h.

Compatibility

Implemented only if 2.0 New Feature Set is present.