I once designed/implemented a language where the biggest mistake I made was using 1-based indexing. At first it looked like it was going to be easier to understand/use but 0-based indexing is actually much more convenient when dealing with indexes arithmetic.
Not only do some of the example's seem to be created specially to show the benefit of the JIT compiler (see pisum fe). Most of the example's do not really use the features of LaPack/BLAS, the one where it does (a matrix multiplication in rand_mat_mult) shows that all languages which use these optimized libraries beat Javascript and handwritten C++ with a magnitude of 2.
Thus, be careful with making such a generalisation from this benchmark. Also, it is much simpler to simply work with a language with proper support for multi-dimensional matrix slicing than having to do this all by hand.
As you note, the last benchmark does precisely what you suggest, comparing completely vectorized code that just calls BLAS. This is a really uninteresting benchmark, however, since everyone is close to C++ (although NumPy, Octave and R still introduce 20%, 69% and 165% overhead, respectively) — because you're really just comparing C++ against C++.
One of the key concepts behind Julia is that not only the end-user, but also the numerical library writer, should benefit from using a high-level language. In Julia, almost all of the library code is in Julia, and it's as fast as the library code written for R, Matlab or NumPy in C. There are also a lot of situations where vectorized code is either awkward or inefficient — especially in terms of creating a lot of unnecessary temporary arrays. Languages where the high-level language is slow force you to do everything vectorized — in Julia, you're not forced to do that. If you want to write a C-style scalar loop, you can and it will be fast (and you don't even need type annotations to make it fast, as shown by the benchmarks).
V8 is really impressively fast, but JavaScript as a language is not very well suited to scientific or technical computing.
That is exactly the point of julia. It provides you access to BLAS/LAPACK when you need it, and it still gives you something fast when vectorization is not natural. It basically does not require vectorization as a prerequisite for performance. Julia wouldn't be the first language to attempt this - Matlab does this reasonably well, but R, Octave, Scilab, etc. do not.
Julia does have multi-dimensional arrays and slicing also. If you had to do all this by hand, it would just be simpler to use C.
http://julialang.org/manual/arrays/
Running Mathematica scripts automatically from the command-line is major pain. That and just writing the benchmark code. A contribution would be most welcomed.
Their site is down right now, so the 10 day old comments might be able to satisfy some amount of curiosity.