> serial computation on a fixed dataset that can be threaded per data point. So ...

> serial computation on a fixed dataset that can be threaded per data point. So if I have 32 data points it's only that parallel

The general rule of thumb is that within a generation, higher clock speed yields better performance per core. If you only have 32 data points, then you will probably get better performance with the 4 GHz 32-core CPU.

> if I'm doing a massive serial calculation does an individual core get to use the full cache

Your single core will get to use the entire L3 cache, but L2 and L1 caches are per-core and so your single core doing the work will not have access to those. So yes, there conceivably could be a benefit due to the larger L3 cache.

On a broader note, these kinds of factors (frequency, cache size, parallelism) tend to be extremely workload-specific and unpredictable, so the only real way to find out what's faster is to measure your specific workload.