pcc-20110629 corrupted static variable (32bit x86)

Discussion:

Szabolcs Nagy

2011-06-30 09:41:20 UTC

recent pcc miscompiles some of my codes around local
static variables..

i tried to make a minimal test case, but it's
still several files hence tar.gz

the output of my pcc -S is included

"PCC: pcc 1.1.0.DEVEL 20110629 for i686-pc-linux-gnu"

Anders Magnusson

2011-07-01 14:44:12 UTC

Permalink

Hi,

it may just have been fixed, I tried right now and both test cases gave
the same output.
Can you please recheck?

-- Ragge

Post by Szabolcs Nagy
recent pcc miscompiles some of my codes around local
static variables..
i tried to make a minimal test case, but it's
still several files hence tar.gz
the output of my pcc -S is included
"PCC: pcc 1.1.0.DEVEL 20110629 for i686-pc-linux-gnu"

Szabolcs Nagy

2011-07-01 14:51:53 UTC

Permalink

Post by Anders Magnusson
it may just have been fixed, I tried right now and both test cases
gave the same output.
Can you please recheck?

thanks it seems to be fixed now

Szabolcs Nagy

2011-07-01 19:01:17 UTC

Permalink

Post by Szabolcs Nagy

Post by Anders Magnusson
it may just have been fixed, I tried right now and both test cases
gave the same output.
Can you please recheck?

thanks it seems to be fixed now

but openssl 1.0.0 sha512 test still fails
with similar miscompilation

Szabolcs Nagy

2011-07-01 19:57:23 UTC

Permalink

Post by Szabolcs Nagy
but openssl 1.0.0 sha512 test still fails
with similar miscompilation

another example of miscompilation is probably heirloom/oawk
a simple ./awk '' segfaults
it can be heirloom's fault, i didn't investigate the problem

Thorsten Glaser

2011-07-01 20:28:13 UTC

Permalink

Post by Szabolcs Nagy
another example of miscompilation is probably heirloom/oawk
a simple ./awk '' segfaults

That's probably still something else:

0x1c008ce8 <execute+200>: lea eax,[ebp-24]
0x1c008ceb <execute+203>: push eax
0x1c008cec <execute+204>: call eax

eax 0xcfbf1980 -809559680
esp 0xcfbf1968 0xcfbf1968

Should be call *eax I think (pcc-20110701.tgz here; mksh regression
test suite after 3-stage bootstrap passes though).

OTOH, pcc debug info (all is built with -g and no -O*) is for naught:

(gdb) r
Starting program: /home/tg/Misc/Vendor/heirloom/oawk/awk ''

Program received signal SIGSEGV, Segmentation fault.
0x1c008cec in execute (a=0xcfbf19a8, n=-1451448736, a=0xcfbf19a8, n=-1451448736) at run.c:139
139 x = (*proc)(a->narg,a->nobj);
(gdb) print a
$1 = (int **) 0xcfbf19a8
(gdb) print proc
No symbol "proc" in current context.
(gdb) print x
No symbol "x" in current context.

Wrong type for a, no symbols there.

bye,
//mirabilos

--
08:05⎜<XTaran:#grml> mika: Does grml have an tool to read Apple
⎜ System Log (asl) files? :)
08:08⎜<ft:#grml> yeah. /bin/rm. ;) 08:09⎜<mrud:#grml> hexdump -C
08:31⎜<XTaran:#grml> ft, mrud: *g*

Anders Magnusson

2011-07-02 07:53:12 UTC

Permalink

Can you extract a test case that gives this error? So I can fix it.

-- Ragge

Post by Szabolcs Nagy
another example of miscompilation is probably heirloom/oawk
a simple ./awk '' segfaults

0x1c008ce8<execute+200>: lea eax,[ebp-24]
0x1c008ceb<execute+203>: push eax
0x1c008cec<execute+204>: call eax
eax 0xcfbf1980 -809559680
esp 0xcfbf1968 0xcfbf1968
Should be call *eax I think (pcc-20110701.tgz here; mksh regression
test suite after 3-stage bootstrap passes though).
(gdb) r
Starting program: /home/tg/Misc/Vendor/heirloom/oawk/awk ''
Program received signal SIGSEGV, Segmentation fault.
0x1c008cec in execute (a=0xcfbf19a8, n=-1451448736, a=0xcfbf19a8, n=-1451448736) at run.c:139
139 x = (*proc)(a->narg,a->nobj);
(gdb) print a
$1 = (int **) 0xcfbf19a8
(gdb) print proc
No symbol "proc" in current context.
(gdb) print x
No symbol "x" in current context.
Wrong type for a, no symbols there.
bye,
//mirabilos

Szabolcs Nagy

2011-07-02 10:49:11 UTC

Permalink

Post by Anders Magnusson
Can you extract a test case that gives this error? So I can fix it.

actually it is pretty hard to extract a test case..

in the openssl case the error only happens when i compile
with optimization (openssl default flags are -O3 -fomit-frame-pointer)
and the entire code is so complex and interdependent that
it's hard to cut the relevant parts out

int the awk case i could produce SIGSEGV, SIGILL and SIGFPE
as well depending on where i put debugging printfs (as gdb
does not work) eg if i put a printf right before
x = (*proc)(a->narg,a->nobj);
line in run.c then the error entirely disappears..

at least i know now that it is run.c that gets miscompiled
(if i compile everything with gcc but run.c then the error
is still there)

the openssl case looks harder as there is many code
involved

Szabolcs Nagy

2011-07-02 14:54:30 UTC

Permalink

Post by Szabolcs Nagy

Post by Anders Magnusson
Can you extract a test case that gives this error? So I can fix it.

at least i know now that it is run.c that gets miscompiled
(if i compile everything with gcc but run.c then the error
is still there)

ok i cut down the awk code a bit
it should be simpler to find the error now

Szabolcs Nagy

2011-07-02 21:50:00 UTC

Permalink

Post by Szabolcs Nagy
the openssl case looks harder as there is many code
involved

i managed to cut out a reasonable sized part that
reproduces the issue

using the functions of sha512.c directly from the
test code sha512t.c did not produce an error
so i had to include some parts of the digest envelope
layer..

if i compile sha512.c with pcc but all other
.c with gcc then the error appears

so my guess is that sha512.c gets miscompiled
and it corrupts the evp ctx structure

the error only appears when i compile with optimization

Iain Hibbert

2011-07-03 18:45:23 UTC

Permalink

Post by Szabolcs Nagy
so my guess is that sha512.c gets miscompiled
and it corrupts the evp ctx structure

the SHA512_Final() function from sha512.c corrupts the stack

in digest.c line 94, we have

ret=ctx->digest->final(ctx,md);

but if I add printf("ctx %p\n", ctx) before and after that, the value
changes..

EVP_DigestFinal_ex: A 0xbfbfe4a8
EVP_DigestFinal_ex: B 0xa54ca49f

this call is equivalent to SHA512_Final(md, ctx->md_data)

..will look more closely after some food :)

iain

Iain Hibbert

2011-07-03 20:22:58 UTC

Permalink

Post by Iain Hibbert

Post by Szabolcs Nagy
so my guess is that sha512.c gets miscompiled
and it corrupts the evp ctx structure

the SHA512_Final() function from sha512.c corrupts the stack
in digest.c line 94, we have
ret=ctx->digest->final(ctx,md);
but if I add printf("ctx %p\n", ctx) before and after that, the value
changes..
EVP_DigestFinal_ex: A 0xbfbfe4a8
EVP_DigestFinal_ex: B 0xa54ca49f
this call is equivalent to SHA512_Final(md, ctx->md_data)
..will look more closely after some food :)

Ok, so.. the ctx value is being held in %esi during the function call, and
SHA512_Final() does save and restore it in the normal way, keeping it at
-8(%ebp)

..BUT on line 206 we have

case SHA512_DIGEST_LENGTH:
for (n=0;n<SHA512_DIGEST_LENGTH/8;n++)
{
SHA_LONG64 t = c->h[n];

*(md++) = (unsigned char)(t>>56);
*(md++) = (unsigned char)(t>>48);
*(md++) = (unsigned char)(t>>40);
*(md++) = (unsigned char)(t>>32);
*(md++) = (unsigned char)(t>>24);
*(md++) = (unsigned char)(t>>16);
*(md++) = (unsigned char)(t>>8);
/* line 206 ----> */ *(md++) = (unsigned char)(t);
}
break;

and the assembler for this line is as follows

.stabn 68,0,206,.LL430-SHA512_Final
.LL430:
movl %esi,-8(%ebp)
movb -8(%ebp),%dl
movl 8(%ebp),%eax
incl %eax
movl %eax,8(%ebp)
movb %dl,-1(%eax)

which I don't think is right.. for some reason, ccom has written a
register out over the stored value, in order to get a byte portion?

regards,
iain

Iain Hibbert

2011-07-04 10:51:50 UTC

Permalink

Post by Iain Hibbert
which I don't think is right.. for some reason, ccom has written a
register out over the stored value, in order to get a byte portion?

a shorter example

extern long long x;

void
foo(unsigned char *md)
{
long long t = x;

*(md++) = (unsigned char)(t >> 8);
*(md++) = (unsigned char)(t);
}

when compiled with "pcc -O2 -S", produces the following

.text
.align 4
.globl foo
.type foo,@function
foo:
pushl %ebp
movl %esp,%ebp
subl $12,%esp
movl %ebx,-4(%ebp) ; [2]
movl %esi,-8(%ebp)
movl %edi,-12(%ebp)
.L12:
movl 8(%ebp),%ebx
.L14:
movl x+4,%edi
movl x,%esi
movb $8,%cl
movl %esi,%eax
movl %edi,%edx
shrdl %edx,%eax
sarl %cl,%edx
testb $32,%cl
je 1f
movl %edx,%eax
sarl $31,%edx
1:
incl %ebx
movb %al,-1(%ebx)
movl %esi,-4(%ebp) ; [1]
movb -4(%ebp),%al
incl %ebx
movb %al,-1(%ebx)
movl -4(%ebp),%ebx ; [2]
movl -8(%ebp),%esi
movl -12(%ebp),%edi
leave
ret
.size foo,.-foo
.ident "PCC: pcc 1.1.0.DEVEL 20110701 for netbsd-i386"

Which shows that "-4(%ebp)" is being used as a scratch area at [1], but
also to store the upstream value of %ebx at [2]

I think though, looking at the rest of the assembly, that the scratch area
ought to have been "4(%epb)" since that was otherwise unused?

(in any case, this code appears to be emitted from arch/local2.c line 602
but my expertise ends here I don't know what to do about it :)

iain

Anders Magnusson

2011-08-06 15:18:15 UTC

Permalink

Thanks all for the help with this, I just fixed it :-)

The bug as that a cast from long long to char needed stack storage to do
the actual cast,
but the routine to get stack space couldn't be called so late.

This occurred when a value in for example esi should be cast to a char
stored in ah, and since
al may be occupied by another value it need to store the word on the
stack and then readin a byte.

-- Ragge

Post by Iain Hibbert

Post by Iain Hibbert
which I don't think is right.. for some reason, ccom has written a
register out over the stored value, in order to get a byte portion?

a shorter example
extern long long x;
void
foo(unsigned char *md)
{
long long t = x;
*(md++) = (unsigned char)(t>> 8);
*(md++) = (unsigned char)(t);
}
when compiled with "pcc -O2 -S", produces the following
.text
.align 4
.globl foo
pushl %ebp
movl %esp,%ebp
subl $12,%esp
movl %ebx,-4(%ebp) ; [2]
movl %esi,-8(%ebp)
movl %edi,-12(%ebp)
movl 8(%ebp),%ebx
movl x+4,%edi
movl x,%esi
movb $8,%cl
movl %esi,%eax
movl %edi,%edx
shrdl %edx,%eax
sarl %cl,%edx
testb $32,%cl
je 1f
movl %edx,%eax
sarl $31,%edx
incl %ebx
movb %al,-1(%ebx)
movl %esi,-4(%ebp) ; [1]
movb -4(%ebp),%al
incl %ebx
movb %al,-1(%ebx)
movl -4(%ebp),%ebx ; [2]
movl -8(%ebp),%esi
movl -12(%ebp),%edi
leave
ret
.size foo,.-foo
.ident "PCC: pcc 1.1.0.DEVEL 20110701 for netbsd-i386"
Which shows that "-4(%ebp)" is being used as a scratch area at [1], but
also to store the upstream value of %ebx at [2]
I think though, looking at the rest of the assembly, that the scratch area
ought to have been "4(%epb)" since that was otherwise unused?
(in any case, this code appears to be emitted from arch/local2.c line 602
but my expertise ends here I don't know what to do about it :)
iain

Szabolcs Nagy

2011-08-06 23:32:14 UTC

Permalink

Post by Anders Magnusson
Thanks all for the help with this, I just fixed it :-)

nice

i can confirm that the sha512 test of openssl now passes

unfortunately openssl-1.0.0d test still fails with pcc
(either GF2m mod test fails or the openssl binary segfaults
depending on configuration parameters, but i have to spend
more time on it to see what's going on)