Your CPU has 4 cores, but 8 threads due to Hyper-threading. It's quite normal that not all programs will benefit from the additional threads. It would be interesting to see how the compiler performs on more physical cores.
Compiling programs is usually one of the cases that does benefit from filling all the hyperthreads, though? That's why people often cargo cult running "make -j<cores * 2>".
In case other people don't see why this would be the case, hyperthreads are good for running a second thread while the first one stalls while waiting for something such as a memory read. Compilers often work with indirect data structures such as irregular graphs and symbol tables which don't cache perfectly and may cause memory stalls.
That is a quote from the CL, not the hardware of the person you replied to. That being said, I'm pretty interested to see how this would perform on other systems, specifically production servers.
All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop.
First, going from tip to this CL with c=1 costs about 3% CPU and has almost no memory impact.
Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc.
From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%
Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time.