Post by Michael ShalayeffPost by Irek SzczesniakPost by Michael ShalayeffPost by олÑга кÑÑжановÑкаÑDoes pcc support real 128bit long double on sparc64, i.e. issues the
correct 128bit float operations as specified in sparcv9?
most cpus (and not only sparc) do not even implement complete
instruction set for smaller floats thus using them big ones
would be a waste anyway. usually when higher precision is
required people use libraries accordingly...
The SPARCv9 architecture *specification* includes registers and
instructions for 128bit floating point. So far this is emulated via
traps, except for the Fujitsu SPARC64 IXfx which implements *most* of
these instructions in hardware.
you are answering your own argument.
once -- it implements most but not all.
Maybe you don't understand the classifications:
1. sparcv9 is the specification which defines the overall instruction set
2. ultrasparc is a family of sparcv9 implementations sold by Sun
Microsystems which implement a subset of sparcv9
2. sparc64 is a family of sparcv9 implementations sold by Fujitsu
which implement a subset of sparcv9
Post by Michael Shalayefftwice -- it's only one totally uncommon cpu type.
sparc64 accounts for 80% of the cpus implementing sparcv9 sold since
2006. So this is a damn lot of uncommon cpu type in relation to
sparcv9 implementations. Of course not so much compared to i386
implementations, but that's not part of the discussion here.
Finally, there is *now* Fujitsu SPARC64 IXfx (available since 2012 and
now tickling into outfits having servers), which implements almost all
of the quadfloat instructions in hardware. This gives the sparc64
family an even greater advantage.
Post by Michael ShalayeffPost by Irek SzczesniakNotable is that SPARC64 is an out of order/speculative design which
supports entering traps speculatively. On the first 128bit fp
instruction the trap is entered non-speculatively, but subsequent
executions are entered based on a predictive trap cache logic and run
thus *MUCH* faster (factor 1.2-2.8 depending on CPU model) than the
libc emulation. So in scenarios like loops smaller than 8192-16
instructions the native sparcv9 128bit fp instructions are faster.
you are not making much sense... you start talking about only one cpu
type and than provide cpu-dependant stats which are not clear how do
they apply to this _one_ cpu type. in any case i cannot believe that
a context switch to emulate would be ever faster than a direct
It does fit perfectly if you tried to understand what Irek wrote:
He said that the first time a specific instruction triggers an
emulation trap it will be slow but if you repeat the same instruction
it will cost the same amount of effort like a function call because
the sparc64 cpus "remember" (trap cache) where the trap went.
Emulation traps also do not constitute a context switch in sparcv9.
Code cache lines associated with traps are also "locked" for a certain
number of CPU cycles (16384? Irek?) to prevent that an emulation trap
can suffer from repeated cache misses or TLB lookups; this gives traps
a significant benefit over plain function calls.
Post by Michael Shalayeffperhaps you are talking about a specific emulation for one cpu
type again but then it will not work for a common user.
See my first paragraph.
Post by Michael ShalayeffPost by Irek SzczesniakIMO the correct way for pcc would be to implement 128bit floating
point instructions as defined in sparcv9, and later add CPU flags for
ultrasparc to use the libc emulation *OPTIONALLY*.
imho gcc does it the right way. the most common case is NOT
super obscure single cpu type. if you as a software developer
are optimising for your actual hardware -- that's your task.
most common average usparc would not have any of that stuff
implemented...
As I said, all newer sparc64 cpus (I talked about those sold since
2006, that's just 80% of the sparcv9 market) benefit from
-mhard-quad-float.
Then there is the _formal_ issue: If you specify the sparcv9 ABI you
want the sparcv9 ABI and not some half-arsed guesswork where the
compiler folks though it's correct. gcc is just wrong with defaulting
to -msoft-quad-float, in blatant violation of the sparcv9
specification.
So: sparcv9 means sparcv9, if you want cpu-specific workarounds like
calling into libc for quadfloat emulation without traps you have to
specify that via -mcpu=ultrasparc.
Also embedded sparcv9 does not have the libc quadfloat emulation but
does have the trap emulation, so for interoperability the compiler
should issue the sparcv9 quadfloat instructions and any libc emulation
calls are OS specific.
Josh