Discussion:
error("invalid preprocessor directive") [Was: CVS commit: pcc/cc/cpp]
Antoine Leca
2013-02-22 15:56:14 UTC
Permalink
Module Name: pcc
Committed By: plunky
Date: Wed Oct 31 12:13:43 UTC 2012
pcc/cc/cpp: token.c
simplify logic in ppdir(), and error() for invalid directive
cvs rdiff -u -r1.97 -r1.98 pcc/cc/cpp/token.c
Last part:

-out: while ((ch = inch()) != '\n' && ch != -1)
- ;
- unch('\n');
+out: error("invalid preprocessor directive");
}

This change causes the behaviour on garbage after # to change,
from being silently ignored to forcing a warning.

At the very least, the new behaviour have to be optionally disabled: for
reason which are beyond my understanding, a lot of i386 assembler code
out there seem to consider the comment character is not / but #, and
furthermore people expect assembler-with-cpp to deal quietly with such
case; needless to say, the new behaviour breaks such expectation.

A new option to disable the error case could be designed, but I am
asking myself why does it cause a fatal error (error() ends with exit()
in cpp)? It seems to me to be a case for warning(), since it seems easy
to recover from such a case (just using the previous code.)

Also, under the C standard, such a case is a "non-directive" (6.10)
which does not have any defined behaviour in the Standard. So anything
is conforming, from accepting silently to aborting.

I had a look at the behaviour of clang on this case, and found
(PPDirectives.cpp, around lines 585-600) that they handle as special the
case of assembler-with-cpp to no emit the diagnostic and to pass the #
along with the rest of the line, with macros expanded.
Testing suggests GCC is having the same behaviour.


Antoine
Thorsten Glaser
2013-02-22 21:57:06 UTC
Permalink
Post by Antoine Leca
This change causes the behaviour on garbage after # to change,
from being silently ignored to forcing a warning.
Ouch! That’s like disallowing ‘'’ in comments… a bad idea.

In traditional C IIRC anything after a # goes if the # is
followed by some whitespace before the “garbage”, and if
the “garbage” is a valid command it’s still accepted; this
is used, for example, to make
# warning "foo"
work on newer compilers (they parse it as #warning) and
older ones (they just warn about an unknown directive,
or ignore the line altogether). Never have I seen it
error out.

That being said, assembler comment characters are crazy
all over, twice and bent backwards. IMHO everyone writing
assembly for Unix should use .S file extension and use
cpp comments, that is /* … */, in their code, so that
as(1) doesn’t even *see* them.

Sadly, reality differs, as usual.

bye,
//mirabilos
--
„nein: BerliOS und Sourceforge sind Plattformen für Projekte, github ist
eine Plattform für Einzelkämpfer“
-- dieses Zitat ist ein Beweis dafür, daß auch ein blindes Huhn
mal ein Korn findet, bzw. – in diesem Fall – Recht haben kann
Iain Hibbert
2013-02-24 08:44:17 UTC
Permalink
Post by Antoine Leca
Module Name: pcc
Committed By: plunky
Date: Wed Oct 31 12:13:43 UTC 2012
pcc/cc/cpp: token.c
simplify logic in ppdir(), and error() for invalid directive
cvs rdiff -u -r1.97 -r1.98 pcc/cc/cpp/token.c
-out: while ((ch = inch()) != '\n' && ch != -1)
- ;
- unch('\n');
+out: error("invalid preprocessor directive");
}
This change causes the behaviour on garbage after # to change,
from being silently ignored to forcing a warning.
Hm, I see.. perhaps this was too severe.. I personally prefer that
automatic processing produces an error when it encounters incomprehensible
input, rather than just ignoring it
Post by Antoine Leca
I had a look at the behaviour of clang on this case, and found
(PPDirectives.cpp, around lines 585-600) that they handle as special the
case of assembler-with-cpp to no emit the diagnostic and to pass the #
along with the rest of the line, with macros expanded.
Testing suggests GCC is having the same behaviour.
hm, my testing with gcc (4.5.4) suggested that it produces a similar
error..

% cat foo.c
# foo
% gcc -c foo.c
foo.c:1:3: error: invalid preprocessing directive #foo

though, I see that with -xassembler-with-cpp or -xassembler it does not so
we should handle that. I will look at it this week

iain
Antoine Leca
2013-02-25 09:23:25 UTC
Permalink
Post by Iain Hibbert
Post by Antoine Leca
I had a look at the behaviour of clang on this case, and found
(PPDirectives.cpp, around lines 585-600) that they handle as special the
case of assembler-with-cpp to no emit the diagnostic and to pass the #
along with the rest of the line, with macros expanded.
Testing suggests GCC is having the same behaviour.
hm, my testing with gcc (4.5.4) suggested that it produces a similar
error..
% cat foo.c
# foo
% gcc -c foo.c
foo.c:1:3: error: invalid preprocessing directive #foo
though, I see that with -xassembler-with-cpp or -xassembler it does not so
Neither if the file is named foo.S; so we agree on the experiments.
Post by Iain Hibbert
we should handle that.
It seems gcc handles it through -lang-asm (or perhaps
-fno-directives-only in GCC 4.x?), which switch we do not have or pass;
the only sign we got from the driver is __ASSEMBLER__ being #defined. Is
it enough? (i.e., can we let some plain code compiled with that symbol
defined for whatever reason be handled as .S)
Else it should not be very complex to add another knob in
preprocess_input, under the if(ascpp)

Also, there is the point about macro expansion behind the #, as the
comments in clang explains (and to which -fno-directives-only could be
related? not sure), and which AFAICS pcc was not expanding.


Antoine
Iain Hibbert
2013-02-25 21:49:28 UTC
Permalink
Post by Antoine Leca
Post by Iain Hibbert
Post by Antoine Leca
I had a look at the behaviour of clang on this case, and found
(PPDirectives.cpp, around lines 585-600) that they handle as special the
case of assembler-with-cpp to no emit the diagnostic and to pass the #
along with the rest of the line, with macros expanded.
Testing suggests GCC is having the same behaviour.
hm, my testing with gcc (4.5.4) suggested that it produces a similar
error..
% cat foo.c
# foo
% gcc -c foo.c
foo.c:1:3: error: invalid preprocessing directive #foo
though, I see that with -xassembler-with-cpp or -xassembler it does not so
Neither if the file is named foo.S; so we agree on the experiments.
that should be the same; though pcc front end is not really clever enough
at this time to deal with all the combinations of file types
Post by Antoine Leca
Post by Iain Hibbert
we should handle that.
It seems gcc handles it through -lang-asm (or perhaps
-fno-directives-only in GCC 4.x?), which switch we do not have or pass;
the only sign we got from the driver is __ASSEMBLER__ being #defined. Is
it enough? (i.e., can we let some plain code compiled with that symbol
defined for whatever reason be handled as .S)
Else it should not be very complex to add another knob in
preprocess_input, under the if(ascpp)
No I will add a knob to cpp that the driver will apply for assembler
files, to indicate that a non-directive may be discarded silently
Post by Antoine Leca
Also, there is the point about macro expansion behind the #, as the
comments in clang explains
It does not explain much though, I wonder if there is an actual use case
for that..

I think if somebody is using the C preprocessor to expand macros for an
assembler file, they should probably realise that # is a special character
to the C preprocessor..

iain
Antoine Leca
2013-02-26 08:18:41 UTC
Permalink
Post by Iain Hibbert
Post by Antoine Leca
Also, there is the point about macro expansion behind the #, as the
comments in clang explains
It does not explain much though, I wonder if there is an actual use case
for that..
To be honest I cannot think of any...
Post by Iain Hibbert
I think if somebody is using the C preprocessor to expand macros for an
assembler file, they should probably realise that # is a special character
to the C preprocessor..
Right. And since this would be a complexification of our code without
useful justification, it makes much sense to drop it.

On the other hand, this is a genuine difference with the elder brothers;
so I guess this should be recorded at some place, just to avoid wasting
time searching about that information in the future.
Perhaps this thread is enough, in fact.


Antoine
Szabolcs Nagy
2013-02-26 08:49:03 UTC
Permalink
Post by Antoine Leca
Post by Iain Hibbert
It does not explain much though, I wonder if there is an actual use case
for that..
To be honest I cannot think of any...
#define A "foo"
#define f(x) #x
puts(f(#) A);

should be translated to

puts("#" "foo");
Iain Hibbert
2013-02-26 19:12:02 UTC
Permalink
Post by Szabolcs Nagy
Post by Antoine Leca
Post by Iain Hibbert
It does not explain much though, I wonder if there is an actual use case
for that..
To be honest I cannot think of any...
#define A "foo"
#define f(x) #x
puts(f(#) A);
should be translated to
puts("#" "foo");
If it does not already do that, then that would be a different issue..

regards,
iain
Szabolcs Nagy
2013-02-26 19:56:00 UTC
Permalink
Post by Iain Hibbert
Post by Szabolcs Nagy
#define A "foo"
#define f(x) #x
puts(f(#) A);
should be translated to
puts("#" "foo");
If it does not already do that, then that would be a different issue..
no

the question was if macro expansion makes sense after '#'
and i showed a case that is valid c code and the result
depends on correct macro expansion after '#'
Iain Hibbert
2013-02-26 21:49:56 UTC
Permalink
Post by Szabolcs Nagy
Post by Iain Hibbert
Post by Szabolcs Nagy
#define A "foo"
#define f(x) #x
puts(f(#) A);
should be translated to
puts("#" "foo");
If it does not already do that, then that would be a different issue..
no
the question was if macro expansion makes sense after '#'
and i showed a case that is valid c code and the result
depends on correct macro expansion after '#'
I understood it was relating to the following

#define FOO foo
# FOO

when compiled using "gcc -xassembler-with-cpp", produces

# 1 "foo"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "foo"

# foo

so as you see, the macro expansion is carried out as if the non-directive
was text, rather than just discarding the line as a comment. It is this
feature that I am not sure if there is a use for..

iain
Szabolcs Nagy
2013-02-26 23:34:22 UTC
Permalink
Post by Iain Hibbert
I understood it was relating to the following
#define FOO foo
# FOO
when compiled using "gcc -xassembler-with-cpp", produces
# foo
so as you see, the macro expansion is carried out as if the non-directive
was text, rather than just discarding the line as a comment. It is this
feature that I am not sure if there is a use for..
ah ok, this seems silly indeed

Szabolcs Nagy
2013-02-26 08:38:54 UTC
Permalink
Post by Iain Hibbert
Post by Antoine Leca
Also, there is the point about macro expansion behind the #, as the
comments in clang explains
It does not explain much though, I wonder if there is an actual use case
for that..
"The preprocessing tokens within a preprocessing directive
are not subject to macro expansion unless otherwise stated."

if # starts a directive then in general there is no expansion
(only after if,elif,include,line)

but in normal text lines there is
Szabolcs Nagy
2013-02-24 19:48:48 UTC
Permalink
Post by Antoine Leca
At the very least, the new behaviour have to be optionally disabled: for
reason which are beyond my understanding, a lot of i386 assembler code
out there seem to consider the comment character is not / but #, and
furthermore people expect assembler-with-cpp to deal quietly with such
case; needless to say, the new behaviour breaks such expectation.
4.3bsd in 1990 already used all possible comment
characters in asm: /, #, |, ; and c style /**/

the original pdp unix used '/' but i dont think
there was ever a convention to use that on i386

i think sys v unix used #

and these cover historical unix implementations

when asm code is preprocessed with cpp then the
only safe comment method is c style comments
Post by Antoine Leca
Also, under the C standard, such a case is a "non-directive" (6.10)
which does not have any defined behaviour in the Standard. So anything
is conforming, from accepting silently to aborting.
no, if something is not specified by the standard then
it is undefined behaviour by omission
(not implementation defined, and not unspecified behaviour)

non-directives are present in the standard because
they are allowed in a skipped ifdef group

otherwise their semantics are undefined

(for implementation defined preprocessing directives
the standard specifies #pragma)
Post by Antoine Leca
I had a look at the behaviour of clang on this case, and found
(PPDirectives.cpp, around lines 585-600) that they handle as special the
case of assembler-with-cpp to no emit the diagnostic and to pass the #
along with the rest of the line, with macros expanded.
Testing suggests GCC is having the same behaviour.
that seems reasonable, fatal error is a bad thing to do
Antoine Leca
2013-02-25 10:12:06 UTC
Permalink
Post by Szabolcs Nagy
Post by Antoine Leca
At the very least, the new behaviour have to be optionally disabled: for
reason which are beyond my understanding, a lot of i386 assembler code
out there seem to consider the comment character is not / but #, and
furthermore people expect assembler-with-cpp to deal quietly with such
case; needless to say, the new behaviour breaks such expectation.
4.3bsd in 1990 already used all possible comment
characters in asm: /, #, |, ; and c style /**/
the original pdp unix used '/' but i dont think
there was ever a convention to use that on i386
Mmmh, this is going off-topic but here are my $.02: the original AT&T
assembler for 80386, which was a port (done by Interactive Systems
Corporation) of the "portable" assembler of System V as of R2, indeed
used / for comments; this was continued into Solaris encumbered AS and
required there the use of \/ for the division operator (the operator was
added later); GAS port for i386 a few years later caught (I think from
the BSD base, from the standard VaX assembler syntax) the use of # for
comments, and used/allowed both symbols; later they implemented a
--divide option to AS which removes the possibility to use single / as
comment introducer (this option is standard in NetBSD at least.)
Post by Szabolcs Nagy
when asm code is preprocessed with cpp then the
only safe comment method is c style comments
Thanks, but as most people reading this list, I am not actually writing
code, but really trying to make code written by others to work with pcc.
Post by Szabolcs Nagy
Post by Antoine Leca
Also, under the C standard, such a case is a "non-directive" (6.10)
which does not have any defined behaviour in the Standard. So anything
is conforming, from accepting silently to aborting.
no, if something is not specified by the standard then
it is undefined behaviour by omission
(not implementation defined, and not unspecified behaviour)
"No" what? I do not see in what you are writing, where it actually
differs from what I wrote.
Post by Szabolcs Nagy
non-directives are present in the standard because
they are allowed in a skipped ifdef group
Sorry if I am picky, but can you quote the place where it is said that
outside a skipped {if,elif,else} group, syntax does not allow
'non-directive'? Furthermore, it would then be a syntax error and as
such requiring a diagnostic.
Post by Szabolcs Nagy
otherwise their semantics are undefined
... and the compiler conforms doing whatever it wants; seems we agree!


Antoine
Szabolcs Nagy
2013-02-25 12:07:06 UTC
Permalink
Post by Antoine Leca
Mmmh, this is going off-topic but here are my $.02: the original AT&T
assembler for 80386, which was a port (done by Interactive Systems
Corporation) of the "portable" assembler of System V as of R2, indeed
used / for comments; this was continued into Solaris encumbered AS and
required there the use of \/ for the division operator (the operator was
added later); GAS port for i386 a few years later caught (I think from
the BSD base, from the standard VaX assembler syntax) the use of # for
comments, and used/allowed both symbols; later they implemented a
--divide option to AS which removes the possibility to use single / as
comment introducer (this option is standard in NetBSD at least.)
i see, thanks for the explanation
Post by Antoine Leca
Post by Szabolcs Nagy
Post by Antoine Leca
which does not have any defined behaviour in the Standard. So anything
is conforming, from accepting silently to aborting.
no, if something is not specified by the standard then
it is undefined behaviour by omission
(not implementation defined, and not unspecified behaviour)
"No" what? I do not see in what you are writing, where it actually
differs from what I wrote.
sorry i misread your comments
Post by Antoine Leca
Post by Szabolcs Nagy
non-directives are present in the standard because
they are allowed in a skipped ifdef group
Sorry if I am picky, but can you quote the place where it is said that
outside a skipped {if,elif,else} group, syntax does not allow
it is allowed and that's what i meant

the the grammar is specified this way so non-directive
is not a syntax error in a skipped ifdef group

but the semantics is not defined so whenever it appears
outside skipped ifdef groups it invokes undefined behaviour
Iain Hibbert
2013-02-26 19:39:50 UTC
Permalink
Post by Antoine Leca
This change causes the behaviour on garbage after # to change,
from being silently ignored to forcing a warning.
Ok so I made a change, and for -xassembler-with-cpp files then it will
ignore the garbage.

though I see that there is a bug wrt .S files; the front end does not seem
to set 'ascpp' or define __ASSEMBLER__, I can look at that later

iain
Loading...