Discussion:
[Pcc-commit] CVS commit: pcc/arch/i386
Iain Hibbert
2014-05-27 15:55:44 UTC
Permalink
Module Name: pcc
Committed By: ragge
Date: Sat May 24 20:11:26 UTC 2014
pcc/arch/i386: local2.c
Use "enter" if we have to subtract from stack. Saves a bunch of bytes per
function.
although apparently, it uses more cycles :)

http://ww2.cs.mu.oz.au/~mgiuca/x86/

iain
Anders Magnusson
2014-05-27 19:43:54 UTC
Permalink
Post by Iain Hibbert
Module Name: pcc
Committed By: ragge
Date: Sat May 24 20:11:26 UTC 2014
pcc/arch/i386: local2.c
Use "enter" if we have to subtract from stack. Saves a bunch of bytes per
function.
although apparently, it uses more cycles :)
http://ww2.cs.mu.oz.au/~mgiuca/x86/
On a real i386, yes, but I really doubt that it does that on modern CPUs.
...and if the program is supposed to be run on a real i386, I assume
that size matters more than speed :-)

Hm, maybe time to add support for -Os? :-)

-- Ragge
Antoine Leca
2014-05-28 12:00:36 UTC
Permalink
Post by Anders Magnusson
Post by Iain Hibbert
Use "enter" if we have to subtract from stack. Saves a bunch of
bytes per function.
although apparently, it uses more cycles :)
http://ww2.cs.mu.oz.au/~mgiuca/x86/
BTW: that page is slightly wrong about the number of bytes saved: the
non-enter 32-bit prologue is 6 bytes when addto is less than 128, and 9
bytes if not, not 12 bytes; still there is a clear save (2 or 5 bytes.)
Post by Anders Magnusson
On a real i386, yes, but I really doubt that it does that on modern CPUs.
On modern CPUs, fact is that I do not know (and I did not search long.)

But Intel, in its "Architecture Optimization" reference manual, edicts a
rule (currently #31, p. 3-21 of March 2014 ed.) to avoid "complex
instructions" (sic) like enter, leave or loop, since they have more than
four micro-ops and require multiple cycles to decode.

This rule is there for a loooong time now, so perhaps it is obsolete,
perhaps even long obsolete; perhaps it does not apply to AMD. Perhaps it
does not apply to enter x, $0; but the fact it mentions leave makes me
think that at least at some point, there was a penalty even for the
"simple" case (now, an elementary reasoning here leads to think that if
something has to be improved in microcode design here, it is to be able
to fast decode leave since it is quite often used.)

Also interesting reading is Atom-specific rule #19 page 14-11, which
states that leave is still emulated in microcode then; by the way it
also points to use lea rather than sub to adjust esp (adds byte?)
Post by Anders Magnusson
...and if the program is supposed to be run on a real i386, I assume
that size matters more than speed :-)
Hm, maybe time to add support for -Os? :-)
I'd support it; and indeed using enter in -Os case makes sense.


Antoine

Loading...