Discussion:
Does pcc support a real 128bit long double on sparc64?
ольга крыжановская
2012-08-11 16:02:10 UTC
Permalink
Does pcc support real 128bit long double on sparc64, i.e. issues the
correct 128bit float operations as specified in sparcv9?

Olga
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
Michael Shalayeff
2012-08-11 16:11:26 UTC
Permalink
Post by ольга крыжановская
Does pcc support real 128bit long double on sparc64, i.e. issues the
correct 128bit float operations as specified in sparcv9?
most cpus (and not only sparc) do not even implement complete
instruction set for smaller floats thus using them big ones
would be a waste anyway. usually when higher precision is
required people use libraries accordingly...
cu
--
paranoic mickey (my employers have changed but, the name has remained)
Irek Szczesniak
2012-08-11 16:57:33 UTC
Permalink
Post by Michael Shalayeff
Post by ольга крыжановская
Does pcc support real 128bit long double on sparc64, i.e. issues the
correct 128bit float operations as specified in sparcv9?
most cpus (and not only sparc) do not even implement complete
instruction set for smaller floats thus using them big ones
would be a waste anyway. usually when higher precision is
required people use libraries accordingly...
The SPARCv9 architecture *specification* includes registers and
instructions for 128bit floating point. So far this is emulated via
traps, except for the Fujitsu SPARC64 IXfx which implements *most* of
these instructions in hardware.
Notable is that SPARC64 is an out of order/speculative design which
supports entering traps speculatively. On the first 128bit fp
instruction the trap is entered non-speculatively, but subsequent
executions are entered based on a predictive trap cache logic and run
thus *MUCH* faster (factor 1.2-2.8 depending on CPU model) than the
libc emulation. So in scenarios like loops smaller than 8192-16
instructions the native sparcv9 128bit fp instructions are faster.

IMO the correct way for pcc would be to implement 128bit floating
point instructions as defined in sparcv9, and later add CPU flags for
ultrasparc to use the libc emulation *OPTIONALLY*.
Please do *NOT* do this by default like gcc does - it defaults to
-msoft-quad-float for all sparcv9 architectures which causes a
*devastating* performance penalty on SPARC64 IXfx where
-mhard-quad-float would be much more applicable (in fact
-mhard-quad-float is recommended (on performance grounds) for all
Fujitsu SPARC64 CPUs including and newer than SPARC64 V+).

Irek
Michael Shalayeff
2012-08-11 20:47:05 UTC
Permalink
Post by Irek Szczesniak
Post by Michael Shalayeff
Post by ольга крыжановская
Does pcc support real 128bit long double on sparc64, i.e. issues the
correct 128bit float operations as specified in sparcv9?
most cpus (and not only sparc) do not even implement complete
instruction set for smaller floats thus using them big ones
would be a waste anyway. usually when higher precision is
required people use libraries accordingly...
The SPARCv9 architecture *specification* includes registers and
instructions for 128bit floating point. So far this is emulated via
traps, except for the Fujitsu SPARC64 IXfx which implements *most* of
these instructions in hardware.
you are answering your own argument.
once -- it implements most but not all.
twice -- it's only one totally uncommon cpu type.
Post by Irek Szczesniak
Notable is that SPARC64 is an out of order/speculative design which
supports entering traps speculatively. On the first 128bit fp
instruction the trap is entered non-speculatively, but subsequent
executions are entered based on a predictive trap cache logic and run
thus *MUCH* faster (factor 1.2-2.8 depending on CPU model) than the
libc emulation. So in scenarios like loops smaller than 8192-16
instructions the native sparcv9 128bit fp instructions are faster.
you are not making much sense... you start talking about only one cpu
type and than provide cpu-dependant stats which are not clear how do
they apply to this _one_ cpu type. in any case i cannot believe that
a context switch to emulate would be ever faster than a direct
function call. this is just does not fit anyhow (:
perhaps you are talking about a specific emulation for one cpu
type again but then it will not work for a common user.
Post by Irek Szczesniak
IMO the correct way for pcc would be to implement 128bit floating
point instructions as defined in sparcv9, and later add CPU flags for
ultrasparc to use the libc emulation *OPTIONALLY*.
imho gcc does it the right way. the most common case is NOT
super obscure single cpu type. if you as a software developer
are optimising for your actual hardware -- that's your task.
most common average usparc would not have any of that stuff
implemented...
cu
--
paranoic mickey (my employers have changed but, the name has remained)
Joshuah Hurst
2012-08-12 09:23:46 UTC
Permalink
Post by Michael Shalayeff
Post by Irek Szczesniak
Post by Michael Shalayeff
Post by ольга крыжановская
Does pcc support real 128bit long double on sparc64, i.e. issues the
correct 128bit float operations as specified in sparcv9?
most cpus (and not only sparc) do not even implement complete
instruction set for smaller floats thus using them big ones
would be a waste anyway. usually when higher precision is
required people use libraries accordingly...
The SPARCv9 architecture *specification* includes registers and
instructions for 128bit floating point. So far this is emulated via
traps, except for the Fujitsu SPARC64 IXfx which implements *most* of
these instructions in hardware.
you are answering your own argument.
once -- it implements most but not all.
Maybe you don't understand the classifications:
1. sparcv9 is the specification which defines the overall instruction set
2. ultrasparc is a family of sparcv9 implementations sold by Sun
Microsystems which implement a subset of sparcv9
2. sparc64 is a family of sparcv9 implementations sold by Fujitsu
which implement a subset of sparcv9
Post by Michael Shalayeff
twice -- it's only one totally uncommon cpu type.
sparc64 accounts for 80% of the cpus implementing sparcv9 sold since
2006. So this is a damn lot of uncommon cpu type in relation to
sparcv9 implementations. Of course not so much compared to i386
implementations, but that's not part of the discussion here.

Finally, there is *now* Fujitsu SPARC64 IXfx (available since 2012 and
now tickling into outfits having servers), which implements almost all
of the quadfloat instructions in hardware. This gives the sparc64
family an even greater advantage.
Post by Michael Shalayeff
Post by Irek Szczesniak
Notable is that SPARC64 is an out of order/speculative design which
supports entering traps speculatively. On the first 128bit fp
instruction the trap is entered non-speculatively, but subsequent
executions are entered based on a predictive trap cache logic and run
thus *MUCH* faster (factor 1.2-2.8 depending on CPU model) than the
libc emulation. So in scenarios like loops smaller than 8192-16
instructions the native sparcv9 128bit fp instructions are faster.
you are not making much sense... you start talking about only one cpu
type and than provide cpu-dependant stats which are not clear how do
they apply to this _one_ cpu type. in any case i cannot believe that
a context switch to emulate would be ever faster than a direct
It does fit perfectly if you tried to understand what Irek wrote:
He said that the first time a specific instruction triggers an
emulation trap it will be slow but if you repeat the same instruction
it will cost the same amount of effort like a function call because
the sparc64 cpus "remember" (trap cache) where the trap went.
Emulation traps also do not constitute a context switch in sparcv9.
Code cache lines associated with traps are also "locked" for a certain
number of CPU cycles (16384? Irek?) to prevent that an emulation trap
can suffer from repeated cache misses or TLB lookups; this gives traps
a significant benefit over plain function calls.
Post by Michael Shalayeff
perhaps you are talking about a specific emulation for one cpu
type again but then it will not work for a common user.
See my first paragraph.
Post by Michael Shalayeff
Post by Irek Szczesniak
IMO the correct way for pcc would be to implement 128bit floating
point instructions as defined in sparcv9, and later add CPU flags for
ultrasparc to use the libc emulation *OPTIONALLY*.
imho gcc does it the right way. the most common case is NOT
super obscure single cpu type. if you as a software developer
are optimising for your actual hardware -- that's your task.
most common average usparc would not have any of that stuff
implemented...
As I said, all newer sparc64 cpus (I talked about those sold since
2006, that's just 80% of the sparcv9 market) benefit from
-mhard-quad-float.
Then there is the _formal_ issue: If you specify the sparcv9 ABI you
want the sparcv9 ABI and not some half-arsed guesswork where the
compiler folks though it's correct. gcc is just wrong with defaulting
to -msoft-quad-float, in blatant violation of the sparcv9
specification.
So: sparcv9 means sparcv9, if you want cpu-specific workarounds like
calling into libc for quadfloat emulation without traps you have to
specify that via -mcpu=ultrasparc.
Also embedded sparcv9 does not have the libc quadfloat emulation but
does have the trap emulation, so for interoperability the compiler
should issue the sparcv9 quadfloat instructions and any libc emulation
calls are OS specific.

Josh
Michael Shalayeff
2012-08-12 09:59:34 UTC
Permalink
Post by Joshuah Hurst
Post by Michael Shalayeff
twice -- it's only one totally uncommon cpu type.
sparc64 accounts for 80% of the cpus implementing sparcv9 sold since
2006. So this is a damn lot of uncommon cpu type in relation to
sparcv9 implementations. Of course not so much compared to i386
implementations, but that's not part of the discussion here.
you are mixing everything in one pot.
out of all sparc64 cpus out there exisiting in various machines
this one super-new cpu only shipping since this year is completely
uncommon and obscure especially that not so many operating systems
would even support it since it only came out this year.
thus having any default setting suited for one very super-new
and completely uncommon case of usage does not make any sense.
Post by Joshuah Hurst
He said that the first time a specific instruction triggers an
emulation trap it will be slow but if you repeat the same instruction
it will cost the same amount of effort like a function call because
the sparc64 cpus "remember" (trap cache) where the trap went.
i'm very well aware how cpus work.
only thing that seem to overcome is common usage not
some theoretical model based on specification.
cache is finite and one same instruction in a loop is not
such a common case as you might think.
of course maybe for one this application you are developing
it might be the case -- that's why compilers have options.
Post by Joshuah Hurst
As I said, all newer sparc64 cpus (I talked about those sold since
2006, that's just 80% of the sparcv9 market) benefit from
-mhard-quad-float.
i'm sorry to disappoint you but most of those instructions
are emulated in the kernel. of course various cpu models
implement this and that piece but no you cannot go away
without emulation.
and as much as you preach cache and other trap mumboojumbo
it will never be faster than library calls on average.
it is in most modern unix systems a context switch.

cu
--
paranoic mickey (my employers have changed but, the name has remained)
Thorsten Glaser
2012-08-12 13:14:11 UTC
Permalink
Post by Joshuah Hurst
1. sparcv9 is the specification which defines the overall instruction set
2. ultrasparc is a family of sparcv9 implementations sold by Sun
Microsystems which implement a subset of sparcv9
2. sparc64 is a family of sparcv9 implementations sold by Fujitsu
which implement a subset of sparcv9
That’s the nice thing about nomenclature: diversity.

4. sparc64 is also the name of the architecture for 64-bit
SPARC CPUs and their usually associated hardware, just
like amd64 is the name for 64-bit AMD and late Intel CPUs.

And that’s the real common use of “sparc64” in OS/toolchain development,
which uses its own, mostly somewhat consistent, naming instead of follo‐
wing every vendor’s (IA32 retrofitting) or GNU’s (x86_64 WTF?) whim.

bye,
//mirabilos
--
[...] if maybe ext3fs wasn't a better pick, or jfs, or maybe reiserfs, oh but
what about xfs, and if only i had waited until reiser4 was ready... in the be-
ginning, there was ffs, and in the middle, there was ffs, and at the end, there
was still ffs, and the sys admins knew it was good. :) -- Ted Unangst über *fs
Fred J. Tydeman
2012-08-12 15:33:56 UTC
Permalink
Post by Joshuah Hurst
1. sparcv9 is the specification which defines the overall instruction set
2. ultrasparc is a family of sparcv9 implementations sold by Sun
Microsystems which implement a subset of sparcv9
2. sparc64 is a family of sparcv9 implementations sold by Fujitsu
which implement a subset of sparcv9
I have five little C test programs posted to a public web site
http://www.tybor.com
that show how bad/good C compilers and libraries are.
The results are in the comments in the C source at the start
of each file.

So, how well does sparcv9 do with those tests?


---
Fred J. Tydeman Tydeman Consulting
***@tybor.com Testing, numerics, programming
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.─ э8─
Anders Magnusson
2012-08-18 15:56:25 UTC
Permalink
Post by ольга крыжановская
Does pcc support real 128bit long double on sparc64, i.e. issues the
correct 128bit float operations as specified in sparcv9?
Besides all other discussions about this; no, long double support is
not added at all to sparc64 yet.

-- Ragge

Loading...