The parser/lexer consider the syntax of the language. Syntax is easy. Semantics is about meaning. It's quite easy to have non-clearly-defined meanings of syntactic constructs! Just write some vague language in the specification. Don't confuse the semantics of the language with the behavior of a particular implementation (eg, GCC). It's important for the language to have clear, well-defined semantics so that implementations can be compatible and programmers, compilers, analysis tools, etc can all agree on the meaning of a program.
Follow some links from the page for concrete examples:
My question was "If you have a computer language which has defined elements of the syntax how can you not have clearly defined meanings of combinations of the syntax?"
Also, these links don't have anything that is defined in C that isn't "deterministic". I can look at all of these pieces of code and see what is happening and it's all stemming from a misinterpretation of what the standards specify, not from what a "lack of aclear and complete semantics".
Some of these are also not issues at all. I have no idea what they mean to show with provenance_equality_global_yx.c. They take two pointers, change one of the values, and see if the two different pointers are equal. It's obvious in most cases this will be faults but its not an invariant.
First of all, there is no guarantee for layout/ordering of data in the binary. Secondly, a pointer is a memory address. A Memory Address + 1 is just 1 sizeof(int) (in this case). If you're on a system that needs specific aligning and special datatype sizes it might even be impossible to have two ints 1 sizeof(int) away.
There's no reason to think you'd be able to get the same output from that code from multiple architectures because 1) it may be impossible on some hardware and 2) it's not specified in the spec that that will work.
> If you're on a system that needs specific aligning and special datatype sizes it might even be impossible to have two ints 1 sizeof(int) away
In this case, you're right about that, since these aren't arrays being considered. But if p is pointing to an array object of length n, with i <= n - 2, you must have &p[i] + 1 == p[i + 1], from the definition of array indexing in 6.5.2.1.2 of N1570. So, it must always be possible to have two ints 1 sizeof(int) away, otherwise it's not C.
----
The entire Cerberus research project is about finding inconsistencies between the spec, compiler implementations, and human understanding, and creating an accurate formal specification of C. They have found numerous of these. N2090 is about a spec bug, inconsistent with another WG document, and inconsistent with practice.
> My question was "If you have a computer language which has defined elements of the syntax how can you not have clearly defined meanings of combinations of the syntax?"
Because the C standard has implementation-defined and undefined-behaviour, which means every syntactically valid C program has a broad range of possible meanings.
You more or less answered your own question. The very fact that they are able to misinterpret the standard is sufficient proof that the standard lacks a clear and complete semantics. If the standard were formalized, it would not allow misinterpretations.
Think of simple mathematical equations: you can only misinterpret them if you don't know enough mathematics, the equation itself allows only one meaning.
They also make it memory-safe & type-safe by design so any undefined behavior should happen in SYSTEM/UNSAFE modules hidden behind interfaces that you know to check.
Follow some links from the page for concrete examples:
https://www.cl.cam.ac.uk/~pes20/cerberus/n2090.html
https://www.cl.cam.ac.uk/~pes20/cerberus/n2089.html
And, of course, their paper, which has examples and explanations: https://www.cl.cam.ac.uk/~km569/into_the_depths_of_C.pdf