Honest question: what is so bad about having to create one process per core?
If you have lots of data you can use shared memory, so the performance should be similar no?
For shops that really care about highly performant parallel tasks, the difference matters. For people who just want more performance for free it's too much work versus what other languages provide.
If OCAML were already popular I don't think the lack of parallelism would be huge issue, but it's niche, which is a huge downside to any language in a business context. The upsides the language provides in terms of performance, ergonomics, and maintainability need to overcome that downside. All the issues I listed mean that OCAML can't generally pass that bar.
I don't know. Maybe because it is GC-only? It doesn't have custom allocators, so there may be no easy way to map an object to some location in shared memory.