Speaking very loosely, primitives on the JVM are values which are represented directly in memory, instead of as pointers to objects on the heap. Clojure (again, very loosely) generally treats everything as a pointer to a heap object. There is no specialized equivalent for, say, a vector of shorts, or a map where values are floats. The compiler can emit specialized function signatures for... IIRC longs and doubles, but other types (e.g. byte, float) aren't directly accessible--they go through widening conversion. It's also easy for the compiler to quietly fail to recognize it can preserve primitives in some kinds of loops, so you wind up with what Java calls "autoboxing": wrapping a primitive in a corresponding Object type.
Here's a recent example of some code in a hot path inside Elle, one of Jepsen's safety checkers. It does a lot in primitives, using packed structures and bitmasks to avoid pointer chasing.
How often is this necessary? I haven't been able to make an example of Java code performing faster than Clojure. I tried to make the java equivalent of this in Clojure
Necessary depends on your use case! I spend a lot of time waiting on analyses, and the more operations I can test, the more bugs I can find. I probably invest more time in performance optimization than most people.
Regarding your specific example, uh, I don't know how you measured, but that feels... off. Here:
```
package scratch;
public class Sum {
public static long sum(long[] longs) {
long sum = 0;
for (int i = 0; i < longs.length; i++) {
sum += longs[i];
}
return sum;
}
}
```
Summing an array of 10^8 longs like you suggested takes ~120 ms on my machine.
```
user=> (quick-bench (Sum/sum longs))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 119.804166 ms
Execution time std-deviation : 9.124709 ms
Execution time lower quantile : 115.154905 ms ( 2.5%)
Execution time upper quantile : 135.597497 ms (97.5%)
Overhead used : 20.384654 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 15.4197 % Variance is moderately inflated by outliers
```
Your Clojure function, which sums a lazy range, takes ~4.7 seconds--about 40x slower.
```
user=> (quick-bench (sum-clojure 100000000))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 4.739575 sec
Execution time std-deviation : 346.895187 ms
Execution time lower quantile : 4.557732 sec ( 2.5%)
Execution time upper quantile : 5.324739 sec (97.5%)
Overhead used : 20.384654 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 15.3163 % Variance is moderately inflated by outliers
```
An idiomatic Clojure reduction over the same array of longs, just so we're measuring apples to apples, takes about 12 seconds--about 100x slower.
```
user=> (reduce + longs)
4999999950000000
user=> (quick-bench (reduce + longs))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 12.037821 sec
Execution time std-deviation : 433.661044 ms
Execution time lower quantile : 11.790637 sec ( 2.5%)
Execution time upper quantile : 12.779262 sec (97.5%)
Overhead used : 20.384654 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
```
Incidentally, this is one of the reasons I wrote `loopr` (https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for...). One of the iteration tactics it can compile to is iteration over arrays. Nowhere near Java--I think it's probably still boxing a fair bit, and I'd need to disassemble it to see--but it's still 10x faster than the standard reduce here. ~10x slower than Java.
```
user=> (quick-bench (loopr [sum 0] [x ^"[J" longs :via :array] (recur (+ sum x))))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 1.259095 sec
Execution time std-deviation : 3.876650 ms
Execution time lower quantile : 1.256692 sec ( 2.5%)
Execution time upper quantile : 1.265747 sec (97.5%)
Overhead used : 20.321072 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
```
Edit: I wonder if there's a caching thing going on. I am having wildly different performance depending on what number I put in. Does criterium affect this by isolating something?
I read a little more about this before you replied, but it seems like we're doing something different or there's a JVM difference that is bigger than expected.
- The clojure function takes me less than a second.
- (reduce + (long-array 100000000)) also takes less than a second.
- From what I understand, the compiler may have a special rule around using reduce and + that uses unboxed numbers by default
- Isn't it a big "if" that you might have a big array of primitive longs in Clojure? I can see how if you've already gone through a whole range of numbers and unboxed them then the java function would be fast. But it seems like it'd be easier to use + (again, this is running a lot faster on my machine for whatever reason).
I'm new to all of this so just being academic and trying to figure out what's going on
Here's a recent example of some code in a hot path inside Elle, one of Jepsen's safety checkers. It does a lot in primitives, using packed structures and bitmasks to avoid pointer chasing.
https://github.com/jepsen-io/elle/blob/main/src/elle/BFSPath...
There was actually a Clojure version of this earlier that got pretty close perf-wise, but I wound up dropping to Java for it instead:
https://github.com/jepsen-io/elle/blob/913cbff5ebb19ba850c0a...