Thanks for the perspective. You're a lot more knowledgeable about this than me and I appreciate you taking the time to explain it. I think I see your point about why people are defensive. If you're coming from Python or something then performance does look like a priority. That said Clojure is not only not trying to be Fortran, but it's also kinda not trying to match Java in terms of performance granularity. You can in effect write Java in Clojure, but the language seems to be discouraging you. Things start to look ugly and hard to parse.
> The philosophy is that the choice of data-structure matters, and when performance is required, the programmer should make the right data-structure choices and use the appropriate data-structure optimized functions.
Yeah, that's very sensible. When you know you're not just getting some seq, and you know you're going to get a vector - then you should use vector specific functionality. But I sort of don't get the impression the core library is compartmentalized this way. Maybe I've somehow overlooked it.. For instance I don't think there are a lot of function that take a vec on input, so in the vector case .. uhh what is the vector specific way to access the first element quickly? (not being snarky, genuinely asking)
> Clojure wants to quickly make you realize you should not try to use the default and most straightforward way to implement this, but instead you need to immediately design for that performance.
Yeah, things like VisualVM quickly make this apparent :) I guess my mild annoyance is at how ugly the code starts to get when it's optimized.
Tangentially, you might be a good person to ask: Do you happen to know why Java arrays aren't first class with their own syntax sugar like [],{},#{} etc ? Why are the docs discouraging their use? Is it because the other datatypes can't be coerced to a array? I often want to use them, but I feel I'm working against the grain of the language and doing something wrong. But I don't really appreciate the tradeoff involved
> You can in effect write Java in Clojure, but the language seems to be discouraging you. Things start to look ugly and hard to parse
Clojure is really a functional language at heart, so when you need to go down to an imperative model for performance I agree that's when things start to feel not quite right, and you begin to fight the language a little, the lack of standard imperative loops become an obvious annoyance. But Clojure's stance is that it's hosted for a reason, and just write those imperative pieces in Java which is designed from the ground up for imperative programming, Clojure doesn't try to pretend like it can do imperative better than Java.
I'd say on the performance scale, Clojure tries to be a fast functional dynamic language. A lot of the dynamic behavior have a runtime cost, so there's a limit here in terms of raw performance, where the best it can hope to achieve is be as fast as statically typed Java.
I think as a fast functional dynamic language it succeeds, and this is where most of the effort has gone in terms of optimization.
> When you know you're not just getting some seq, and you know you're going to get a vector - then you should use vector specific functionality. But I sort of don't get the impression the core library is compartmentalized this way
> For instance I don't think there are a lot of function that take a vec on input, so in the vector case .. uhh what is the vector specific way to access the first element quickly? (not being snarky, genuinely asking)
Like the cheatsheet shows, there's a few different vector functions for getting elements:
([1 2 3] 0) ;> 1
(get [1 2 3] 0) ;> 1
There's not a function dedicated to getting the first element like there is for sequences though. So you'd just get the element at index zero.
For manipulating data in vectors, there's mapv and filterv and reduce, and you can leverage all of the transducer functions as well.
Basically the trick for performance with vectors is to use transient when creating them, use reduce to iterate over them (or a function based on reduce like mapv, filterv and all transducers), and use their index for getting elements from them.
> Tangentially, you might be a good person to ask: Do you happen to know why Java arrays aren't first class with their own syntax sugar like [],{},#{} etc ?
Clojure is very opinionated towards functional programming and immutability. Arrays are an imperative construct with mutation and specific memory address lookups as their basic premise. Clojure is also opinionated towards data driven/value driven modeling. Arrays aren't very good for that, they are homogeneous and they don't have value semantics.
Basically from the point of view of Clojure, arrays are a specialized tool that you should only need to use in special cases, because of some special non-functional requirement. And because they don't follow the standard Clojure data expectations: immutable and heterogeneous with value semantics, Clojure wants it that when an array is used, it is very explicit and obvious, so it uses different functions for them, that also happen to be better suited for arrays.
> I often want to use them, but I feel I'm working against the grain of the language and doing something wrong. But I don't really appreciate the tradeoff involved
I mean, you shouldn't use Clojure if you don't favour the functional programming style. One of the tradeoff is slightly worse performance, because computers are imperative machines. People who use Clojure are happy with it, because they believe they get better modularity, reuse, safety and productivity from functional programming, and lose very little efficiency and performance that it is a good trade off, where users won't notice the difference, but the programmer gains a lot of benefits.
Clojure cares about making sure that it delivers a fast and efficient implementation of a dynamic functional programming language. That's why it provides a good implementation of persistent collections, which are non-trivial to write, that's why it batches lazy-sequences, that's why it uses the JVM JIT runtime which does a great job at dealing with lots of small temporary objects and optimizing away extra indirections. That's why it goes out of its way to allow type hints for faster dispatch, of providing records over maps, array-maps for small maps, transients for local mutation over immutable collections, transducers and reducing fast iteration for data manipulation, primitive handling in loop/recur, etc.
I think you'd describe Clojure's performance target as: sufficiently performant. The claim is that it can give you dynamic and immutable data semantics that is sufficiently efficient to be used for practical applications, with paths towards selective optimizations where needed, like allowing full mutability in deftypes, and such.
The one thing it almost never does though is give you unmanaged mutability. It's pretty resilient to that, that's why you won't find anything faster than a volatile! (except for arrays). It doesn't want you to start having shared memory bugs in concurrent contexts. Though libraries exist for those.
Finally, I wouldn't say it discourages the use of arrays, when arrays is what you need it gives you the tools to do so. But when arrays is what you want because you have performance OCD, but don't actually need that level of performance, it would rather you used functional, immutable, heterogeneous with value semantics data-structures and functions instead, and will nudge you towards those.
The cheatsheet is very handy! Thank you. I'd completely forgotten about it. I usually end up poking around Clojuredocs, but it's very unstructured and you're just hunting for a function that fits the bill.
And yeah, your analysis of functional programming paradigms in Clojure mirrors how I think of it as well. I see that a lot, how the language is really pushing you to program functionally and almost intentionally making imperative/mutable programming painful so that you don't do it :)
Now that you've explained it it does seems obvious that arrays don't work too well with immutable structures. You're right about the OCD. C++ kinda instills that in you a bit. It feels "wrong" to have sequences of the same datatype (like pixels in an image) use the same container as sequences of varying data types (like a hiccup vector). For instance cache misses are something one is used to worrying about in C++ while in Clojure it's not even on the radar
I really appreciate you taking the time to explain things - especially so deep down in a thread
Fluokitten and Neanderthal are great for high performance numerics. Can't really get any faster than Neanderthal (which uses MKL and the GPU for compute). And fluokitten can make working with arrays a bit nicer, in that it'll look a bit more like working with sequences, though you still need to be careful about the functions you use with it to avoid boxing.
oh woah very cool! sorry for the late response - I wanted to take the time to look at it more closely. Thanks for that info. I'd never really looked into Fluokitten. It looks awesome :D
I've played with Neanderthal but the MKL dependency is a bit annoying (doesn't work on my Pinebook Pro and I can't ship binaries with it) - so I'd overlooked Fluokitten entirely. But this is very complementary to the Clojury way of doing stuff and has no dependencies. I'm def going to try to work this into my standard toolbox :)) I've only done the normal java-interop stuff in the past but this looks like a definite step up
> The philosophy is that the choice of data-structure matters, and when performance is required, the programmer should make the right data-structure choices and use the appropriate data-structure optimized functions.
Yeah, that's very sensible. When you know you're not just getting some seq, and you know you're going to get a vector - then you should use vector specific functionality. But I sort of don't get the impression the core library is compartmentalized this way. Maybe I've somehow overlooked it.. For instance I don't think there are a lot of function that take a vec on input, so in the vector case .. uhh what is the vector specific way to access the first element quickly? (not being snarky, genuinely asking)
> Clojure wants to quickly make you realize you should not try to use the default and most straightforward way to implement this, but instead you need to immediately design for that performance.
Yeah, things like VisualVM quickly make this apparent :) I guess my mild annoyance is at how ugly the code starts to get when it's optimized.
Tangentially, you might be a good person to ask: Do you happen to know why Java arrays aren't first class with their own syntax sugar like [],{},#{} etc ? Why are the docs discouraging their use? Is it because the other datatypes can't be coerced to a array? I often want to use them, but I feel I'm working against the grain of the language and doing something wrong. But I don't really appreciate the tradeoff involved