Bits for exponents and fraction in a IEEE754-2008 floating point value?

Discussion:

ольга крыжановская

2013-09-02 07:24:00 UTC

Is there any algorithm, to find out the number of bits used for the
exponent and fraction of a floating point type (e.g. one of binary16,
binary32, binary64, binary80, binary128), if I only know the number of
bytes used by the type?

Olga

--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`

Fred J. Tydeman

2013-09-02 14:39:32 UTC

Permalink

Post by Ð¾Ð»ÑÐ³Ð° ÐºÑÑÐ¶Ð°Ð½Ð¾Ð²ÑÐºÐ°Ñ
Is there any algorithm, to find out the number of bits used for the
exponent and fraction of a floating point type (e.g. one of binary16,
binary32, binary64, binary80, binary128), if I only know the number of
bytes used by the type?

There are a couple of programs to find out those things:
W. J. Cody: MACHAR
W. M. Kahan: Paranoia

One test in Paranoia is:
(( 4. / 3. - 1.) - 1. / 4.) * 3. - 1. / 4.

1./3. in binary is 0.010101... (the simplest repeating pattern).
1. is exactly representable and a power of the radix.
1. + 1./3., or, 4./3. is an integer with a truncated infinite fraction.
Hence, this is a way to find the precision.

---
Fred J. Tydeman Tydeman Consulting
***@tybor.com Testing, numerics, programming
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.

ольга крыжановская

2013-09-02 15:24:29 UTC

Permalink

Fred, than you, but this was not the information I was looking for. I
need the number of bits used by exponent and fraction of a specific
IEEE 754-2008 floating point type, so that I can dissect and extract
those bits for manual manipulation. I am trying to write two
functions, one to extract the integer payload of a NaN value, and a
2nd function to create a NaN value with a specific integer payload.

Olga

Post by Fred J. Tydeman

W. J. Cody: MACHAR
W. M. Kahan: Paranoia
(( 4. / 3. - 1.) - 1. / 4.) * 3. - 1. / 4.
1./3. in binary is 0.010101... (the simplest repeating pattern).
1. is exactly representable and a power of the radix.
1. + 1./3., or, 4./3. is an integer with a truncated infinite fraction.
Hence, this is a way to find the precision.
---
Fred J. Tydeman Tydeman Consulting
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.
_______________________________________________
Pcc mailing list
http://lists.ludd.ltu.se/cgi-bin/mailman/listinfo/pcc

--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`

Michael Shalayeff

2013-09-02 15:38:34 UTC

Permalink

Post by Ð¾Ð»ÑÐ³Ð° ÐºÑÑÐ¶Ð°Ð½Ð¾Ð²ÑÐºÐ°Ñ
Fred, than you, but this was not the information I was looking for. I
need the number of bits used by exponent and fraction of a specific
IEEE 754-2008 floating point type, so that I can dissect and extract
those bits for manual manipulation. I am trying to write two
functions, one to extract the integer payload of a NaN value, and a
2nd function to create a NaN value with a specific integer payload.

these are usualy defined as constants in <float.h>
as FLT_* DBL_* and LDBL_* constant respectively for
float double and long double

Post by Ð¾Ð»ÑÐ³Ð° ÐºÑÑÐ¶Ð°Ð½Ð¾Ð²ÑÐºÐ°Ñ

Post by Fred J. Tydeman

--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
_______________________________________________
Pcc mailing list
http://lists.ludd.ltu.se/cgi-bin/mailman/listinfo/pcc

--
paranoic mickey (my employers have changed but, the name has remained)

Szabolcs Nagy

2013-09-02 17:45:47 UTC

Permalink

Post by Michael Shalayeff

these are usualy defined as constants in <float.h>
as FLT_* DBL_* and LDBL_* constant respectively for
float double and long double

if ieee formats are assumed according to iso c annex f (which you should
on any modern system) then the bit representation is well defined
(except byteorder and maybe other quirks, but that's not an issue for
float and double as they fit into integer types unlike the binary128
and 80bit extended precision format, however long double can be tricky
for other reasons as well)

i'm fairly sure wikipedia has a summary of all the formats
(the standard is not freely available but you can find it with google)

what do you use the nan payload for?

ольга крыжановская

2013-09-02 22:17:50 UTC

Permalink

CERN has requested the ability to create a nan value with payload in
our shell interpreter, and extract it later. I think the point is to
make error detection possible, without littering the code with a lot
of if() branches in the hot areas.

Olga

Post by Szabolcs Nagy

Post by Michael Shalayeff

these are usualy defined as constants in <float.h>
as FLT_* DBL_* and LDBL_* constant respectively for
float double and long double

if ieee formats are assumed according to iso c annex f (which you should
on any modern system) then the bit representation is well defined
(except byteorder and maybe other quirks, but that's not an issue for
float and double as they fit into integer types unlike the binary128
and 80bit extended precision format, however long double can be tricky
for other reasons as well)
i'm fairly sure wikipedia has a summary of all the formats
(the standard is not freely available but you can find it with google)
what do you use the nan payload for?

--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`

Fred J. Tydeman

2013-09-03 14:43:40 UTC

Permalink

Post by Ð¾Ð»ÑÐ³Ð° ÐºÑÑÐ¶Ð°Ð½Ð¾Ð²ÑÐºÐ°Ñ
CERN has requested the ability to create a nan value with payload in
our shell interpreter, and extract it later. I think the point is to
make error detection possible, without littering the code with a lot
of if() branches in the hot areas.

How to tell a Quiet NaN from a Signaling NaN is not well defined in
the older IEEE-754-1985 (which is what almost all chips are designed to).

The C committee is working on a binding between C11 and IEEE-754-2008.
We have come up with some functions for working with payloads.

14.10 NaN functions

IEC 60559 defines the payload of a NaN to be a certain part of the NaN's significand interpreted as an integer.
The payload is intended to provide implementation-defined diagnostic information about the NaN, such as
where or how the NaN was created. The following suggested changes to C11 provide functions to get and set
the NaN payloads defined in IEC 60559.

Suggested change to C11:

F.10.13 Payload functions

F.10.13.1 The getpayload functions

Synopsis
[1]
#define __STDC_WANT_IEC_18661_EXT1__
#include <math.h>
double getpayload(const double *x );
float getpayloadf(const float *x );
long double getpayloadl(const long double *x );

Description
[2] The getpayload functions extract the integer value of the payload of a NaN input and return the
integer as a floating-point value. The sign of the returned integer is positive. If *x is not a NaN, the
return result is unspecified. These functions raise no floating-point exceptions, even if *x is a
signaling NaN.

Returns
[3] The functions return a floating-point representation of the integer value of the payload of the NaN
input.

F.10.13.2 The setpayload functions

Synopsis
[1]
#define __STDC_WANT_IEC_18661_EXT1__
#include <math.h>
int setpayload(double *res, double pl);
int setpayloadf(float *res, float pl);
int setpayloadl(long double *res, long double pl);

Description
[2] The setpayload functions create a quiet NaN with the payload specified by pl and a zero sign
bit and store that NaN into the object pointed to by *res. If pl is not a positive floating-point integer
representing a valid payload, *res is set to positive zero.

Returns
[3] If the functions stored the specified NaN, the functions return a zero value, otherwise a non-zero
value (and *res is set to zero).

F.10.13.3 The setpayloadsig functions

Synopsis
[1]
#define __STDC_WANT_IEC_18661_EXT1__
#include <math.h>
int setpayloadsig(double *res, double pl);
int setpayloadsigf(float *res, float pl);
int setpayloadsigl(long double *res, long double pl);

Description
[2] The setpayloadsig functions create a signaling NaN with the payload specified by pl and a
zero sign bit and store that NaN into the object pointed to by *res. If pl is not a positive floating-
point integer representing a valid payload, *res is set to positive zero.

Returns
[3] If the functions stored the specified NaN, the functions return a zero value, otherwise a non-zero
value (and *res is set to zero).

---
Fred J. Tydeman Tydeman Consulting
***@tybor.com Testing, numerics, programming
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.

Szabolcs Nagy

2013-09-03 15:58:28 UTC

Permalink

Post by Fred J. Tydeman
The C committee is working on a binding between C11 and IEEE-754-2008.
We have come up with some functions for working with payloads.

i've seen that paper and i still don't see why ppl do this

nan is not guaranteed to go through all arithmetics unchanged
(eg when there are two nan inputs to an operation)

it also does not go through a printf/scanf cycle, so serialization
is problematic, and one cannot create nan-with-payload literals
in the source code

so it seems to me c can only support a single nan value properly
(no differece between different qnans and snans) anything else
will introduce portability issues and other breakage
(not to mention math library issues)

Post by Fred J. Tydeman
double getpayload(const double *x );
float getpayloadf(const float *x );
long double getpayloadl(const long double *x );
int setpayload(double *res, double pl);
int setpayloadf(float *res, float pl);
int setpayloadl(long double *res, long double pl);

floating-point nan payload is problematic

language implementors often use nan payload to efficently store
non-floating-point data in a floating point value, others use
it for efficient error code propagation

if the getpayload stores the result into a fpu register
instead of into an integer one where it did the bit
manipulation in the first place then ppl who do the
bithacks for efficiency will not use such an api

error handling is not very nice either in case of setpayload
(why does it set res to 0 on error?)

Fred J. Tydeman

2013-09-03 18:58:02 UTC

Permalink

Post by Szabolcs Nagy

Post by Fred J. Tydeman
The C committee is working on a binding between C11 and IEEE-754-2008.
We have come up with some functions for working with payloads.

i've seen that paper and i still don't see why ppl do this

We believe that some users want the ability to use payloads.

Post by Szabolcs Nagy
floating-point nan payload is problematic

There is no requirement for 128-bit integers, so using a 128-bit
floating-point is the natural place to store the payload.

Post by Szabolcs Nagy
error handling is not very nice either in case of setpayload
(why does it set res to 0 on error?)

0 means success. non-zero means failure.

---
Fred J. Tydeman Tydeman Consulting
***@tybor.com Testing, numerics, programming
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.

ольга крыжановская

2013-09-02 16:11:26 UTC

Permalink

I've been looking at the <float.h> constants, since a while, but with
out success. I need to be able, to programmatic create the mask and
shift values, to extract exponent and fraction. I am trying to avoid
having to hard code them in the code, because this is not very
portable.

Olga

Post by Michael Shalayeff

these are usualy defined as constants in <float.h>
as FLT_* DBL_* and LDBL_* constant respectively for
float double and long double

Post by Ð¾Ð»ÑÐ³Ð° ÐºÑÑÐ¶Ð°Ð½Ð¾Ð²ÑÐºÐ°Ñ

Post by Fred J. Tydeman

--
paranoic mickey (my employers have changed but, the name has remained)

--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`

ольга крыжановская

2013-09-02 22:18:51 UTC

Permalink

I know the table data. I am trying to figure, if I can calculate the
values at run time, so I do not have to hard code the bit/shift
values.

Olga

Binary FP formats
Total Sign Exponent Significand
width bit bits bits
16 1 5 10
32 1 8 23
64 1 11 52
80 1 15 64
128 1 15 112
---
Fred J. Tydeman Tydeman Consulting
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.

--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`

Fred J. Tydeman

2013-09-03 05:03:19 UTC

Permalink

Post by Ð¾Ð»ÑÐ³Ð° ÐºÑÑÐ¶Ð°Ð½Ð¾Ð²ÑÐºÐ°Ñ
I know the table data. I am trying to figure, if I can calculate the
values at run time, so I do not have to hard code the bit/shift
values.

Here are the values and formulas for large widths. But, since
few hardware chips have 128-bit widths, I do not think the
formulas for even larger widths are going to be that helpful.

Table 3.5 - Binary interchange format parameters
Parameter bin16 bin32 bin64 bin128 binary{k} (k >= 128)
k, storage width in bits 16 32 64 128 multiple of 32
p, precision in bits 11 24 53 113 k - round(4*log2(k)) + 13
emax, maximum exponent e 15 127 1023 16383 2**(k-p-1)-1
Encoding parameters
bias, E - e 15 127 1023 16383 emax
sign bit 1 1 1 1 1
w, exponent field width in bits 5 8 11 15 round(4*log2(k)) - 13
t, trailing significand field 10 23 52 112 k-w-1
width in bits
k, storage width in bits 16 32 64 128 1+w+t

The function round( ) in Table 3.5 rounds to the nearest integer.

For example, binary256 would have p = 237 and emax = 262143.

---
Fred J. Tydeman Tydeman Consulting
***@tybor.com Testing, numerics, programming
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.

Szabolcs Nagy

2013-09-03 11:24:05 UTC

Permalink

why?

as i said float formats are well-defined, for long double use
preprocessor #if LDBL_MANT_DIG==

if you don't hardcode the format at at compile time you most
likely invoke undefined behaviour

Fred J. Tydeman

2013-09-02 16:18:22 UTC

Permalink

Binary FP formats

Total Sign Exponent Significand
width bit bits bits
16 1 5 10
32 1 8 23
64 1 11 52
80 1 15 64
128 1 15 112

---
Fred J. Tydeman Tydeman Consulting
***@tybor.com Testing, numerics, programming
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.