I've started working with Flux [1] in Julia, and it's so elegant and such a great experience :). Just look at this definition of a U-net model for image segmentation: https://gist.github.com/haampie/bceb1d59fd9a44f092f913062e58.... Apart from that, you can write your own custom loss functions in pure Julia that run efficiently on the GPU, language level automatic differentiation, proper integration with other packages. If people are moving away from Tensorflow, then Flux could be a solid alternative as well.
IMO, these kinds of functional abstractions look nice on paper, but are a pain in the ass to actually use. In practice, you'll want to print out things in between each layer, you'll want to log each layer's activations, you might want to redirect a layer into another network, etc.
Both PyTorch and Tensorflow have purely functional abstractions, but they're relegated to super basic functionalities.
Julia gives you the best of both worlds. And more.
All those pretty function like things you see above are actually callable objects that can be introspected, intercepted and dispatched on...so you can mix and match pure object abstractions, pure function abstractions and objects with function like properties depending on the usecase.
This is because Julia's philosophy is to make Differentiable Programming a completely seamless and normal programming paradigm inter-operable with all standard code patterns.
And all this is only possible because of a unique mix of amazing reflection, code gen (including hooking into the compiler from third party packages, allowing source to source autodiff and GPU codegen), fast generic/parametric polymorphism even across packages, multiple dispatch and macros, among other technologies.
It's not quite at the stage of "write any normal julia code and it just works", as there are some rough edges being worked out, but that's the vision and it's even now it's leaps and bounds above pytorch.
Sorry, can you clarify on what you think is a purely functional abstraction?
Flux is incredibly flexible and can do all sorts of things that are not limited to purely functional code and Flux is capable of many things that are straight up impossible or infeasible in PyTorch or TensorFlow (with or without their 'purely functional' abstractions).
Super late reply, so it's likely you won't see this... (Too bad HN doesn't notify on replies).
I'm not complaining about Flux in general, I'm talking about the specific example (the UNet) he brought up that he uses to claim that Julia is so elegant.
Can you elaborate on what Flux can do that Pytorch can't?
At this point, the DiffEqFlux neural differential equation library fits the neural ODE example from the original paper in 29 seconds [1]. The forward pass of torchdiffeq on trivial ODEs without neural networks takes 47 seconds [2] (and of course adding in neural networks makes it a lot more expensive). This is a massive real-world difference. It means that the Julia packages are building animations by looking at real-time fitting plots, while it's a hours long ordeal in PyTorch. Being able to use optimized packages instead of hardcoding a simple version of things really pays off in the long run, and here using a real ODE solver suite is not a small difference but rather it's multiple orders of magnitude. That's the real benefit of differentiable programming.
[1] https://github.com/FluxML/Flux.jl