Post by Anders MagnussonPost by Iain HibbertUse "enter" if we have to subtract from stack. Saves a bunch of
bytes per function.
although apparently, it uses more cycles :)
http://ww2.cs.mu.oz.au/~mgiuca/x86/
BTW: that page is slightly wrong about the number of bytes saved: the
non-enter 32-bit prologue is 6 bytes when addto is less than 128, and 9
bytes if not, not 12 bytes; still there is a clear save (2 or 5 bytes.)
Post by Anders MagnussonOn a real i386, yes, but I really doubt that it does that on modern CPUs.
On modern CPUs, fact is that I do not know (and I did not search long.)
But Intel, in its "Architecture Optimization" reference manual, edicts a
rule (currently #31, p. 3-21 of March 2014 ed.) to avoid "complex
instructions" (sic) like enter, leave or loop, since they have more than
four micro-ops and require multiple cycles to decode.
This rule is there for a loooong time now, so perhaps it is obsolete,
perhaps even long obsolete; perhaps it does not apply to AMD. Perhaps it
does not apply to enter x, $0; but the fact it mentions leave makes me
think that at least at some point, there was a penalty even for the
"simple" case (now, an elementary reasoning here leads to think that if
something has to be improved in microcode design here, it is to be able
to fast decode leave since it is quite often used.)
Also interesting reading is Atom-specific rule #19 page 14-11, which
states that leave is still emulated in microcode then; by the way it
also points to use lea rather than sub to adjust esp (adds byte?)
Post by Anders Magnusson...and if the program is supposed to be run on a real i386, I assume
that size matters more than speed :-)
Hm, maybe time to add support for -Os? :-)
I'd support it; and indeed using enter in -Os case makes sense.
Antoine