Enquire: Everything you wanted to know about your C compiler and machine

dianeb · on Feb 24, 2019

I wrote C compilers for several large companies in the 80's and 90's. Everyone in a compiler group had something like this if for no other reason than to demonstrate why "bugs" weren't bugs but "features". My code -- very different than this -- works to this day, but only because I keep it up-to-date with the C standard. Over the years, it's been very useful, especially when moving from one platform to another.

Most code really doesn't need this much detail -- "sizeof" is your friend -- You can start with something like this:

  sizes.c

  #include <stdio.h>

  int isBigEndian() {
    int x = 1;
    return(*(char *)&x != 1);
  }

  int main() {
    printf("This is a %s endian machine\n", isBigEndian() ? "big" : "little");
    printf("char: %lu\n", sizeof(char));
    printf("short: %lu\n", sizeof(short));
    printf("int: %lu\n", sizeof(int));
    printf("long: %lu\n", sizeof(long));
    printf("long long: %lu\n", sizeof(long long));
    printf("float: %lu\n", sizeof(float));
    printf("double: %lu\n", sizeof(double));
    printf("long double: %lu\n", sizeof(long double));
  }

compile as: cc -std=c17 sizes.c

...and add whatever else you need as you go along. A version of my compiler information code is in my forthcoming book.

kqr2 · on Feb 24, 2019

Is there a link to your upcoming book?

usr1106 · on Feb 24, 2019

This looks pretty old. Any idea how current it still is?

A couple if years ago when 64 bit computing got common I made my own much more limited test to see what it changes

https://usrmisc.wordpress.com/2012/12/27/integer-sizes-in-c-...

The tool in this article does not even test `long long` which probably means that it has not been updated for 20 years.

zokier · on Feb 24, 2019

    #define COPYRIGHT(c) 1993-9 Steven Pemberton, CWI. All rights reserved.

usr1106 · on Feb 24, 2019

The question was not about the copyright year, I saw that. But about what language/machine features have changed/been added and are incorrectly/not at all covered by this test code.

tyingq · on Feb 24, 2019

Perl's Configure that is run to build Perl in the first place has to figure out a lot of the same things. You can get all of it from the installed perl as well:

perl -e 'use Config qw(config_sh);print config_sh()'| less

Not as nicely formatted for sure, and lots of perl specific stuff you may not need. But piping that command above and grepping for 'size', for example, works well.

yesenadam · on Feb 24, 2019

Enquire was used in gcc. https://gcc.gnu.org/onlinedocs/gcc/Contributors.html says:

Steven Pemberton for his contribution of enquire which allowed GCC to determine various properties of the floating point unit and generate float.h in older versions of GCC.

"I wrote this originally for a piece of software (ABC, above) that had to run on any hardware, and require no particular knowledge from the person installing it. One day Richard Stallman passed by, and mentioned that they needed such a program for GCC. So I rewrote it and donated it. It produces the file float.h for the GCC compiler. The version here is a slightly more up to date version."

https://homepages.cwi.nl/~steven/

yesenadam · on Feb 25, 2019

And I hadn't heard of ABC, but it looks very similar to Python, grammatically at least, was apparently a major influence on it, and indeed van Rossum had worked on the ABC team.

https://en.wikipedia.org/wiki/ABC_(programming_language)

Someone · on Feb 24, 2019

It’s by Steven Pemberton, one of the first internet users in Europe, co-designer of HTML, CSS, XHTML, XForms, RDFa, and several other Web technologies (https://homepages.cwi.nl/~steven/).

lelf · on Feb 24, 2019

  sh enquire.c

Wow. That’s clever. (See the source.)

Y_Y · on Feb 24, 2019

Although it makes me wonder if its extension shouldn't be ".sh", sinc e this polyglot-metaprogra. is really intended to be run as a shell script.

ordu · on Feb 24, 2019

No, it shouldn't. sh do not look at extension of file, gcc does. It can bring some issues like the need of explicitly stating source's language for gcc in the command line.

saagarjha · on Feb 24, 2019

You can compile and run it too, though.

BlackLotus89 · on Feb 24, 2019

Seems to be from 2003 minimum.

http://web.archive.org/web/20030223030531/http://homepages.c...

Maybe older?

zokier · on Feb 24, 2019

Version history would indicate that first versions are from mid-80s.

lelf · on Feb 24, 2019

Another way:

  echo | cc -dM -E -

Narishma · on Feb 24, 2019

That's not portable. It only works on Unix systems.

tempodox · on Feb 24, 2019

What an intriguing piece of software. I wonder how much work it would be to bring it up to date for current compilers.

saagarjha · on Feb 24, 2019

It works on my compiler, but it's a bit confused about how virtual memory works on modern systems.

saagarjha · on Feb 24, 2019

A note on the values this program produces: most are implementation-defined (and some are generated in ways the exercise undefined behavior, it seems). Please, please, please don't rely on them being the same everywhere. If you're curious about your own machine, that's great, but don't be the person who hardcodes integer sizes or relies on chars being unsigned.

marvel_boy · on Feb 24, 2019

Anybody can list the values for XCode C compiler?

saagarjha · on Feb 24, 2019

Here's what I get with Apple LLVM version 10.0.1 (clang-1001.0.43.3) for x86_64-apple-darwin18.5.0:

  Produced by enquire version 5.1a, CWI, Amsterdam
     http://www.cwi.nl/~steven/enquire.html 
  Compiler claims to be ANSI C level 1
  
  Compiler names are at least 64 chars long
  Preprocessor names are at least 64 long
  
  SIZES
  char = 8 bits, signed
  short=16 int=32 long=64 float=32 double=64 bits 
  long double=128 bits
  char*=64 bits BEWARE! larger than int!
  int* =64 bits BEWARE! larger than int!
  func*=64 bits BEWARE! larger than int!
  Type size_t is unsigned long
  Type wchar_t is signed int
  
  ALIGNMENTS
  char=1 short=2 int=4 long=8
  float=4 double=8
  long double=16
  char*=8 int*=8 func*=8
  
  CHARACTER ORDER
  short: BA
  int:   DCBA
  long:  HGFEDCBA
  
  PROPERTIES OF POINTERS
  Char and int pointer formats seem identical
  Char and function pointer formats seem identical
  Strings are shared
  Type ptrdiff_t is signed long
  Dereferencing NULL causes a trap
  
  PROPERTIES OF INTEGRAL TYPES
  Overflow of a short does not generate a trap
  Maximum short = 32767 (= 2**15-1)
  Minimum short = -32768
  Overflow of an int does not generate a trap
  Maximum int = 2147483647 (= 2**31-1)
  Minimum int = -2147483648
  Overflow of a long does not generate a trap
  Maximum long = 9223372036854775807 (= 2**63-1)
  Minimum long = -9223372036854775808
  Maximum unsigned short = 65535
  Maximum unsigned int = 4294967295
  Maximum unsigned long = 18446744073709551615
  
  PROMOTIONS
  unsigned short promotes to signed int
  long+unsigned gives signed long
  
  PROPERTIES OF FLOAT
  Base = 2
  Significant base digits = 24 (= at least 6 decimal digits)
  Arithmetic rounds towards nearest
     Tie breaking rounds to even
  Smallest x such that 1.0-base**x != 1.0 = -24
  Smallest x such that 1.0-x != 1.0 = 2.98023259e-08
  Smallest x such that 1.0+base**x != 1.0 = -23
  Smallest x such that 1.0+x != 1.0 = 5.96046519e-08
  (Above number + 1.0) - 1.0 = 1.19209290e-07
  Number of bits used for exponent = 8
  Minimum normalised exponent = -126
  Minimum normalised positive number = 1.17549435e-38
  The smallest numbers are not kept normalised
  Smallest unnormalised positive number = 1.40129846e-45
  Maximum exponent = 128
  Maximum number = 3.40282347e+38
  Overflow doesn't seem to generate a trap
  There is an 'infinite' value
  Divide by zero doesn't generate a trap
  Arithmetic uses a hidden bit
  It looks like single length IEEE format
  
  PROPERTIES OF DOUBLE
  Base = 2
  Significant base digits = 53 (= at least 15 decimal digits)
  Arithmetic rounds towards nearest
     Tie breaking rounds to even
  Smallest x such that 1.0-base**x != 1.0 = -53
  Smallest x such that 1.0-x != 1.0 = 5.5511151231257839e-17
  Smallest x such that 1.0+base**x != 1.0 = -52
  Smallest x such that 1.0+x != 1.0 = 1.1102230246251568e-16
  (Above number + 1.0) - 1.0 = 2.2204460492503131e-16
  Number of bits used for exponent = 11
  Minimum normalised exponent = -1022
  Minimum normalised positive number = 2.2250738585072014e-308
  The smallest numbers are not kept normalised
  Smallest unnormalised positive number = 4.9406564584124654e-324
  Maximum exponent = 1024
  Maximum number = 1.7976931348623157e+308
  Overflow doesn't seem to generate a trap
  There is an 'infinite' value
  Divide by zero doesn't generate a trap
  Arithmetic uses a hidden bit
  It looks like double length IEEE format
  
  PROPERTIES OF LONG DOUBLE
  Base = 2
  Significant base digits = 64 (= at least 18 decimal digits)
  Arithmetic rounds towards nearest
     Tie breaking rounds to even
  Smallest x such that 1.0-base**x != 1.0 = -64
  Smallest x such that 1.0-x != 1.0 = 2.71050543121376108531e-20
  Smallest x such that 1.0+base**x != 1.0 = -63
  Smallest x such that 1.0+x != 1.0 = 5.42101086242752217063e-20
  (Above number + 1.0) - 1.0 = 1.08420217248550443401e-19
  Number of bits used for exponent = 15
  Minimum normalised exponent = -16382
  Minimum normalised positive number = 3.36210314311209350626e-4932
  The smallest numbers are not kept normalised
  Smallest unnormalised positive number = 3.64519953188247460253e-4951
  Maximum exponent = 16384
  Maximum number = 1.18973149535723176502e+4932
  Overflow doesn't seem to generate a trap
  There is an 'infinite' value
  Divide by zero doesn't generate a trap
  Only 79 of the 128 bits of a long double are actually used
  It doesn't look like IEEE format
  
  Float expressions are evaluated in float precision
  Double expressions are evaluated in double precision
  Long double expressions are evaluated in long double precision
  Memory mallocatable ~= 138 Tbytes

mistrial9 · on Feb 24, 2019

> Only 79 of the 128 bits of a long double are actually used

that one is notable

jcranmer · on Feb 24, 2019

It's also incorrect.

The long double type on x86 systems is the x87 80-bit floating point number. This type is non-IEEE 754 semantics; in particular, it makes the leading bit before the decimal explicit rather than implicit. The storage semantics are, well, weird. If you use XSAVE/FXSAVE to save x87 FPU state, then the floating point registers take up 16 bytes of space. If you use the FSAVE instruction, each register instead takes up 10 bytes of space. Similarly, the FLD/FSTP also use 10 bytes of space.

The i386 ABI states that long double is 108 bits with 32-bit alignment, while x86-64 uses 128 bits with 128-bit alignment. I suspect the reason for these sizes is to ensure that the size is a multiple of alignment, as well as guaranteeing the correct layout of saving the floating point stack to a long double stack[8]; array via (F)XSAVE in the x86-64 case.

dianeb · on Feb 25, 2019

Somewhat surprisingly, clang 10's long double on a mac is implemented with 64 mantissa digits (bits) (LDBL_MANT_DIG = 64) so the information as presented is correct for clang on a MacBook Pro 2017 with a 2.8 GHz Intel Core i7.

So, is this a bug? Well, perhaps, but not in enquire.c. Experimental usage shows that only 10 bytes (79 bits) are actually used. This is clearly implemented as an x87 80-bit floating point number.

mistrial9 · on Feb 25, 2019

so - a quick test on a different machine than described

on an x86_64 GNU/Linux VM with GCC 8.1, set a double dbf to MAX_DOUBLE (1.7976931348623157e+308), set a long double dbld to (dbf + 1.0), and compare them (dbf == dbld); both values print the same, and the compare returns 1 (True).

dianeb · on Feb 24, 2019

This is Intel's 80-bit floating point. The reason the size is 128 is that it needs to align on a word boundary.

zokier · on Feb 24, 2019

I think the code is more of historical interest than useful for todays compilers.