I don't think so(https://www.tensorflow.org/xla/developing_new_backend). And even if so, it's not relying on LLVM optimizations (for TPU) or the LLVM API. (which is why some of these had to be punted back up to the Julia optimizer for XLA, which was done in a third party package!).
The point is that Julia's design, type system and multiple dispatch facilitates writing dynamic yet highly optimized code for a variety of backends, even those requiring static semantics (unlike LLVM).
There is no way you can look at that paper (or the Flux ecosystem, or the prob programming languages or the SSA IR autodiff) and chalk up Julia's success to just LLVM.
The GPU backend is (and so's the CPU backend, but that's currently too slow to get much information from). The TPU backend isn't (which is why we targeted XLA in the first place).