I might be persuaded to see things from your point of view if you can show that ...

KenoFischer · on July 14, 2018

If you're interested in advanced layout transformations in Julia, I talked about this a bit here: https://www.youtube.com/watch?v=uecdcADM3hY&t=1658s

As for your in place map example, here's how that's written in julia: https://github.com/JuliaGPU/CuArrays.jl/blob/98e32e055e96fe7...

Of course you may say that's cheating because it shifts the problem to the much more general broadcast machinery, but the implementation of that is not particularly tricky either. https://github.com/JuliaGPU/CuArrays.jl/blob/23c4c737a31aff9...

This code works with any multi-dimensional array, independent of layout considerations, though you may of course want to specialize based on layout for performance. Note also that this the code is slightly more general than the one in your example, because it handles arrays that do not have efficient linear indexing.

abstractcontrol · on July 16, 2018

Sorry about not seeing your reply yesterday, HN is not as helpful as Reddit so I have to scan for them manually which can be error prone. I just went through your talk. It was definitely interesting. Despite my confidence in the high-performance abstractions that Spiral allows, I do not have much experience in doing high performance optimizations myself so such field reports motivate my efforts.

So Julia can definitely do flexible data structure layouts, but looking at your code I am starting to understand why it took an expert such as yourself to show me this example. Had you not told me that the link is to a map kernel I would have difficulty figuring it out on my own.

This shifts my objections from 'Julia cannot do it', to 'macro heavy code is quite hard to grasp'. In Spiral you could have written all of this without the need for macros. Spiral does have (text) macros, but their purpose is to do language interop and not abstraction.

The use of macros is a typical anti-pattern in languages whose type systems are too weak for the problem at hand - I am decently sure that with sufficient macro magic any language that has them would have been capable of performing these same optimizations.

One immediate benefit of a powerful type system that I'd like to point out is that Spiral allows reflection over tuples, which is a lot more elegant solution to a problem of a function having variable arguments than the C-style varargs mechanism that I see Julia using in the examples you've shown me. Also, in Spiral reflection is done using pattern matching rather than if statements.

Another benefit is that Spiral has no need for separate static array machinery. Because literals can be a part of a variable's type and if not, unless explicitly prevented the literals tend to be propagated through function call boundaries (join points). Assuming they are known at compile time the inlining of tensor dimensions in loops is actually the default behavior in Spiral. I've yet to do benchmarking to see how helpful that is, but I am sure it would help reduce register pressure.

Your examples are not quite enough to get me to apologize for my rude behavior, but are enough to get me to shift my views a little so I'll change the offending sentence so it reflects reality more accurately.

simondanisch · on July 16, 2018

On Julia 0.7, the broadcast kernel will look much more like this (and actually the version used in CuArrays/CLArrays currently also doesn't use those macros):

https://github.com/jrevels/MixedModeBroadcastAD.jl/blob/mast...

let I = @cuda_index(shape)

    @inbounds dest[I] = bc[I]

end

There are several things to point out here:

1) The macro @cuda_index doesn't really need to be a macro, but like this it's the shortest way to check the index and return when out of range (in CUDA/OpenCL you often need to schedule a loop with more iterations than you need).

2) the `dest[I] = bc[I]` does all the unzipping of arguments and dispatch to the correct fast getindex for a certain layout of dest/bc via the type system + tuples of the arguments (not using C-style varargs). This simple function makes it possible to get all the advantages of julia's lazy fused broadcasting to work.

Feel free to checkout this great blog article to see what's possible with the new lazy broadcasting:

https://julialang.org/blog/2018/05/extensible-broadcast-fusi...

You can also look directly at the broadcasting code in Julia Base, which is already a bit bloated to deal with lots of edge cases, but is still pretty readable in parts:

https://github.com/JuliaLang/julia/blob/master/base/broadcas...

abstractcontrol · on July 17, 2018

I like how the broadcasting comes out in the end in Julia. Besides that Julia has quite many niceties that I could only wish Spiral had and I do think that they matter significantly even if they are not vital. This is definitely praiseworthy work.

Since I haven't done so and I really should have, if you or anyone else still looking at this thread are curious how GPU programming without macros looks like then take a peek at this:

https://github.com/mrakgr/The-Spiral-Language/blob/3a94ca644...

Comparing Spiral and Julia code that has been linked so far, I feel that no one would ever guess without knowing ahead of time that Spiral is a static language and Julia dynamic based on these examples.

As an example of this in action, here is how layer normalization is implemented using the 'seq_broadcast' kernel which does a sequence of reductions and broadcasts in registers. It is also used to implement softmax for example.

https://github.com/mrakgr/The-Spiral-Language/blob/3a94ca644...

simondanisch · on July 17, 2018

>GPU programming without macros looks like then take a peek at this:

I'm not sure if you missed my examples, or if there is some other missunderstanding going on ;)

GPU code in Julia works 100% without macros, but you are free to use them, so people do ;) Also, you can pretty much write the gpu kernels like pseydo code, not sure how much simpler it can get.

With your linked spiral code, I can't even really tell where the algorithm starts and where the setup code ends - which of course might easily be attributed to my unfamiliarity with Spiral ;)

I wrote an article some time ago about generic programming with Julia's gpu infrastructure:

https://techburst.io/writing-extendable-and-hardware-agnosti...

If you can make concrete examples of how things can be simpler than this, I'd be delighted to hear them :)

Cuda only (for no reason actually) and with lots of macros, here is another article about gpu programing in Julia: https://mikeinnes.github.io/2017/08/24/cudanative.html

Do you have any benchmarks for the softmax kernel? If that kernel has optimal performance, it would be quite interesting. If it's sup par, it looks much longer than a simple version.

abstractcontrol · on July 18, 2018

> With your linked spiral code, I can't even really tell where the algorithm starts and where the setup code ends - which of course might easily be attributed to my unfamiliarity with Spiral ;)

Heh, I've noticed that the setup code tends to come out longer than the actual kernels. It is inside `kernel = cuda`. Spiral is indentation sensitive like Python and F#.

The stuff inside {} is just module creation, think of it like tuples with named fields.

> Do you have any benchmarks for the softmax kernel? If that kernel has optimal performance, it would be quite interesting. If it's sup par, it looks much longer than a simple version.

No, I've yet to actually benchmark it. It really depends on how good of a job NVCC does with the generic sequential reduce kernel. I'll do an in depth analysis when I am done with all the neural network work that I am doing currently which might take a while.

https://github.com/mrakgr/The-Spiral-Language/blob/7ecd30bdf...

You can see how it is implement here for the forward and backward parts.

https://github.com/mrakgr/The-Spiral-Language/blob/7ecd30bdf...

In the actual cost function, it is a bit different since I fuse the forward and the backward parts.

It is a bit ugly because of all the type checking for whether the argument is a dual, but that is all done at compile time.

> https://mikeinnes.github.io/2017/08/24/cudanative.html

This is actually loop unrolling example that I had in mind when I said that Julia needs macros for.

https://github.com/mrakgr/The-Spiral-Language#3-loops-and-ar...

You can in fact get loop unrolling with just standard functions in Spiral. I go into it in the context of this chapter. It is achieved by pattern matching over tuples and recursion.

> If you can make concrete examples of how things can be simpler than this, I'd be delighted to hear them :)

Yes, by replacing meta programming with intensional polymorphism and inlinining guarantees. Also reflection should be done using pattern matching. I think this last one could be taken entirely seriously as it would not involve a replacement of the entire type system.

All the claims in the article about inlining and specialization that make it sound like magic is what in general makes me dubious about languages pretending to be speed kings. Yes, I am aware that GPU kernels do not require optimizers capable of having monads for breakfast and that in the context of GPU programming where they were made they are probably true, but inlining is the sort of thing that matters more the more high level a language is. For very high level languages that desire speed, there isn't much choice but to make them a part of language semantics despite the added burden it puts on the user.

Since we are still at it, I have a question I need to ask.

Recently I've been informed that Julia is capable of GCing GPU memory. If this is fully integrated that would be a major feature which is not possible in say .NET or Racket. I really wanted this in Spiral and could not get it in .NET.

By fully integrated, I mean much like for regular memory for which the GC takes note of the state of the system for when to do collection and defragmentation. If it can only make a thin wrapper with a finalizer (much like in .NET) around an unmanaged resource then it is not a big deal.

Is it fully integrated or is it a wrapper style memory management?

If it is the later, then that is too bad, but I'd suggest to Julia devs that they work on making it fully integrated as it would be a really good feature. Obviously, I can't do it in Spiral as I would need to write my own VM and I have only so many years in my life.

simondanisch · on July 18, 2018

>You can in fact get loop unrolling with just standard functions in Spiral.

We can do this too actually - and spirals approach sounds very similar to what you'd do in julia :)

https://gist.github.com/SimonDanisch/812e01d7183681ed414932c...

As a matter of fact, it's what the compiler devs keep recomending. As a developper I must say, that it's pretty relaxing to take a meta programming shortcut from time to time ;) In general, we plan to offer a tool box that employs these kind of patterns for loop unrolling, tiling etc, to make it easier to write GPU code without meta programming and perform certain optimizations/scheduling patterns based on a more trait like system.

> Also reflection should be done using pattern matching

I feel like what Julia does is just more low level right now - And you get pretty far with just multiple dispatch! If we needed more than that, we could put the effort into extending the language into that direction. But I think your use cases must get be pretty advanced untill you need something more complex. Can you recommend a nice article that shows off beautiful pattern matching for reflection? I see how it sounds nice, but I can't really imagine right now how it would improve my life.

Concerning inlining, I do think we actually force codegen to always inline when compiling for the GPU. I don't remember a 100% anymore, but there might have been some problems with that. But it's definitely the goal, to have a flexible compiler that you can fine tune to specific domains like GPU programming. And if we don't get it as part of the compiler, we can definitely do things like that with https://github.com/jrevels/Cassette.jl/

Appart from that, I'm pretty happy that our GPU compilation infrastructure isn't making Julia less useful as a general purpose language ;)

>Is it fully integrated or is it a wrapper style memory management?

It's wrapper style. We're playing around with different kind of hacks to make it less worse, but those are hacks you could do in any GC language. The good news is, that the Julia compiler devs take GPU computing seriously and promised to work on a full integration that is aware of the GPU hardware.

abstractcontrol · on July 19, 2018

> As a matter of fact, it's what the compiler devs keep recomending.

I definitely agree with them in this matter. Macros might have a role in language development, but they should not be a stand in for compiler optimizations. For safety and speed, the type system should be there.

https://github.com/mrakgr/The-Spiral-Language/blob/0741959ac...

Here is how the example you've shown could be done in Spiral. Maybe I should add express support for literal testing in pattern matching, but it is not a pattern that comes up too often by itself.

> I feel like what Julia does is just more low level right now - And you get pretty far with just multiple dispatch!

What you say is exactly right as pattern matching compiles down to those low level operations, so there no reason at all why those low level operations should be done by hand. Pattern matching does not depend on a particular type system or whether the language is dynamic or static. There are only so many good ideas in programming languages and this is one of them.

Though there is some overlap between pattern matching and multiple dispatch, the roles are different. The purpose of multiple dispatch is extensibility, but the purpose of pattern matching is destructuring.

https://stackoverflow.com/questions/2502354/what-is-pattern-...

Note the great disparity in the F# examples that do pattern matching and the C# examples that do manual reflection.

https://github.com/mrakgr/The-Spiral-Language/blob/0741959ac...

Here are a few very simple examples of it in action in Spiral. I use them as compiler tests.

https://github.com/mrakgr/The-Spiral-Language/blob/0741959ac...

Here is quite a complex example of how it is used in action. I won't go into detail of this here, but you can see how I repeatedly match on the contents of a module at different times in order to get more generic functionality for the kernel.

The particular kernel shown here is the most complex one that exists in the library right now - I am yet to get to things like generic matrix multiplication and convolution, but I'll get there eventually.

> I see how it sounds nice, but I can't really imagine right now how it would improve my life.

Let me just say that it is really difficult to know ahead of time how a particular language feature would affect your programming life. I could have said the same thing about first class functions back in 2015. I am sure in the future there will be such features I can't even imagine right now.

> And if we don't get it as part of the compiler, we can definitely do things like that with https://github.com/jrevels/Cassette.jl/

I'll have to watch the talk on this. Thanks for the link.

simondanisch · on July 26, 2018

>The purpose of multiple dispatch is extensibility, but the purpose of pattern matching is destructuring.

Agreed! ;) Just wanted to point out, that I was able to use multiple dispatch elegantly in the unroll example, where you are using pattern matching.

>pattern matching compiles down to those low level operations

So I guess that means Julia could indeed satisfy you, if we gave those low level operations some more syntactic sugar? Have you seen: http://kmsquire.github.io/Match.jl/latest/ ?

>Let me just say that it is really difficult to know ahead of time how a particular language feature would affect your programming life

True story. I'll try to be more observant when writing code and see, if I could actually solve things more elegantly with better pattern matching.

Btw, I'm writing on an article about GPU programming in Julia - do you have feature you'd like to have explained, or a killer feature you believe we can't do and that would impress you if we did?

StefanKarpinski · on July 17, 2018

You keep making assertions based on completely false assumptions about Julia and its type system. It might come across better if you posed these assertions as questions instead. A few counterpoints...

> One immediate benefit of a powerful type system that I'd like to point out is that Spiral allows reflection over tuples, which is a lot more elegant solution to a problem of a function having variable arguments than the C-style varargs mechanism that I see Julia using in the examples you've shown me.

* Julia's type system can reflect on tuples just fine at both compile time and run time and we do so all the time. When tuple size and contents are statically knowable, there is no run time overhead.

* Julia's varargs are represented as tuples, which can be and are reflected on by the compiler. Julia's varargs has nothing to do with C varargs except superficial usage similarity.

> Another benefit is that Spiral has no need for separate static array machinery. Because literals can be a part of a variable's type and if not, unless explicitly prevented the literals tend to be propagated through function call boundaries (join points).

* The fact that Julia's default array types doesn't include static dimensions is a design _choice_ not a type system limitation. Specializing on the exact dimensions of an array is _not_ usually what you want to do in general computational work. If it _is_ what you want to do, you can use the StaticArrays package.

* Literals can be type parameters in Julia too—this is done all the time. Indeed, as I said elsewhere in this thread, type parameters can be other types, integers, floats, symbols, tuples, and recursive constructions thereof.

* The StaticArrays package is implemented in pure Julia, demonstrating that the type system is perfectly capable of reflecting on tuples and having dimension sizes and tuples of them as type parameters—that's how the static array types are implemented. The compiler generates completely unrolled, fully inlined code for static array operations since the dimensions are known at compile time.

> Assuming they are known at compile time the inlining of tensor dimensions in loops is actually the default behavior in Spiral. I've yet to do benchmarking to see how helpful that is, but I am sure it would help reduce register pressure.

I understand that Spiral is targeted at a context (GPUs) where static arrays are a good default. The fact that Julia does not take that approach as a default is not a type system limitation (as proven by the existence of StaticArrays), it's a choice based on the fact that static arrays are not the right default for general numerical computing. There are many situations where being forced to know array dimensions at compile time is not helpful or even actively problematic.

> Your examples are not quite enough to get me to apologize for my rude behavior, but are enough to get me to shift my views a little so I'll change the offending sentence so it reflects reality more accurately.

I'm unclear on why any examples are necessary to apologize for rudeness. Being right doesn't justify being rude.

As far as I can tell, the relationship between Spiral and Julia's approaches is roughly:

* Spiral is static and primarily focused on statically known, fixed sized arrays and generating good code for GPUs.

* Julia is dynamic, and primarily focused on flexibility along with C-level performance on CPUs.

* Julia partially covers Spirals target territory with StaticArrays and GPU code generation capabilities.

* Spiral does not, on the other hand address (or aim) to the more general array computing that Julia supports.

Given the substantial overlap in capabilities, it seems like dismissing Julia out of hand for the application areas you're interested is both unwarranted and unwise. You may want to be a bit less dismissive and understand it and learn from it instead. In particular, Spiral's type system is not more powerful than Julia's. They are different since Spiral's is static and forces types to be known and checked at compile time (as far as I understand). Julia's type system relaxes this, allowing types to be unknown until much later—and as a tradeoff, checked much later. I have yet to see anything that Spiral's type system can express that cannot be expressed in Julia's. Of course, we're also cheating since Julia doesn't do type checking and therefore having a highly expressive type system doesn't really cost us much.

abstractcontrol · on July 18, 2018

* Julia's type system can reflect on tuples just fine at both compile time and run time and we do so all the time.

Then why do I nowhere see that actually being done? Reflection on tuples should be done much like pattern matching on lists in functional languages. What is the point of that VarArgs nonsense? I know that Julia has pattern matching - now it needs to take the next step and actually make use of it.

Is this Julia's way of making itself familiar to C programmers?

I see those `...` elipses used as some kind of operator in both C++ (and Racket ironically) and they are a horrible idea as they are completely implicit and non-obvious in their function.

Using type membership tests + if statements is so 90s. Is Julia also trying to draw in the Java crowd here by trying to be closer to that language?

> <all those other points>

> There are many situations where being forced to know array dimensions at compile time is not helpful or even actively problematic.

It is not that Spiral enforces that dimensions be static. Rather if they are known at compile time then that information is merely propagated forward including through function call boundaries.

You seem to miss what it really means to have first-class types together with staging in a language. Spiral's tensors can be arbitrarily static or dynamic in their dimension and can even allow some dimensions to be static (known at compile time) while the others are dynamic. No friction results from this.

All this does not actually require separate implementations like in Julia. Both Spiral and Julia have Turing complete type systems, but based on this I can conclude that Spiral's is more expressive.

It would be trivial to force it so all the dimensions of a tensor are dynamic, but why would one want to propagate less information during compilation? It is not like dimensions of tensors change that often or arbitrarily like common variables do.

Also let me just state for the record that Spiral is intended to be more than a GPU language. A language with the capabilities of doing it elegantly was my motivation, but Spiral featureset makes it uniquely suited for both very high level and very low level programming.

A language saying it wants to be as fast as C counts for very little, the question is how fast it would be once the code starts being really abstract? This is why the rare one benchmark that currently exists in Spiral's documentation is for parser combinators.

GPU kernels are not a good test bed for language speed because they do not use that many high level features except for the ones needed for tensors. I'd be more concerned with all the scaffolding needed to set up the kernel. And in fact that was one of the majors concerns for me back when I was doing a ML library in F#.

https://www.youtube.com/watch?v=YRoruJRmuLc&t=2537s Jeffrey M. Siskind – The tension between convenience and performance in automatic differentiation

In the field that Julia is aiming for, I'd rather see a benchmark for a CPU based AD library that works on scalars which is a use case in scientific computing. Optimizing this is actually difficult given that none of the mainstream functional languages can do it. I'd expect the same situation as with monads for Julia where the inliner just gives up.

> Of course, we're also cheating since Julia doesn't do type checking and therefore having a highly expressive type system doesn't really cost us much.

I do not know whether the code that was linked in these threads is a representative sample for Julia, but Julia's code to me looks much more like it was written in a static language than Spiral's does.

I think that language wars such as these are necessary in order to come to the truth. It is not that I am being rude on purpose, it is that rudeness in a battle is to be excused and seen as unavoidable. Being right is a process rather than a fact because being able to get it down to a fact is rare and getting the fact to be accepted is even rarer. Cooking a good meal requires flames.

Language semantics have indivisible algorithmic properties that set them apart from each other and hence they cannot ever be equal. The best thing therefore is to enjoy the division. I see this as different than celebrating diversity.

eigenspace · on July 14, 2018

I actually don't know anything about these sorts of things. My complaint is merely that you claimed julia is not even worth comparing with when you don't actually know how tensors are implemented in julia or if whatever it is you're doing here is possible in julia. Instead you compare to PyTorch and assume julia has as shit semantics as PyTorch which is false.

Perhaps someone who knows more can comment on whether or not julia meets your criteria here but either way, I think your presentation is misleading.

abstractcontrol · on July 14, 2018

I cannot change my attitude otherwise I would never get anything done. The only principle that I hold dear, and the only way I can truly be fair is that if I am actually wrong I will change my mind. I would not be so rough on Julia if it didn't present itself as a direct competitor to Spiral. Since it presents itself as a numerical computation language capable of replacing C++ and dominating machine learning and GPU programming I will hold it to the same standard I would take Spiral.

Any numerical computation language should have tensors generic in dimension, layout and type. That is the point I am trying to hammer home. If you think that Julia's tensors are better than PyTorch's and feel that my equal treatment of them is unfair - I honestly feel that is the wrong way to think about this. Rather than fighting to be the king of a very small hill, it would be better to try climbing a much larger mountain instead. Look inward and be unhappy at what the language has now.

Envy Spiral's wonderful tensors. Wouldn't Julia be a much better language if it had its inlining guarantees? And maybe add to that the intensional polymorphism that would allow it to implement them.

Why if that happened, I'd lose my reason for working on the language and have to do something else like study reinforcement learning for example. How horrible would that be? :)

ChrisRackauckas · on July 14, 2018

>Any numerical computation language should have tensors generic in dimension, layout and type. That is the point I am trying to hammer home. If you think that Julia's tensors are better than PyTorch's and feel that my equal treatment of them is unfair - I honestly feel that is the wrong way to think about this. Rather than fighting to be the king of a very small hill, it would be better to try climbing a much larger mountain instead. Look inward and be unhappy at what the language has now.

It's still silly to compare Julia's tensors to PyTorch's because they are generic in dimension, layout, and type unlike PyTorch's and these seem to be exactly the pieces that you're ragging on. This means that your beef against PyTorch is fine, but it doesn't necessarily apply to Julia. I am curious if you can go into more detail about Julia's implementation instead of talking about possible Julia problems through an explanation of PyTorch.

>Wouldn't Julia be a much better language if it had its inlining guarantees?

Not necessarily. Julia is a dynamic interactive language which is made to be used through a REPL. Although you can increasingly statically compile packages, the standard way to use it is interactive and in this case you do have to watch your compile-times closely.

Julia does have opt-in inlining guarantees through the `@inline` macro. It's opt-in instead of opt-out in order to stop unnecessary compile-time growth (inlining isn't always that helpful). To decide when inlining should occur, Julia uses a cost model:

https://github.com/JuliaLang/julia/pull/22210

This then tries to only inline what's necessary to do for performance. Of course, using the fact that Julia can inline what it needs allows things like CUDANative.jl to force inlining and get the kernels it needs.

>And maybe add to that the intensional polymorphism that would allow it to implement them.

This sounds like a good idea. We should take this over to our community and see whether it would work well in the language.

I think the last thing to address is:

>Since it presents itself as a numerical computation language capable of replacing C++ and dominating machine learning and GPU programming I will hold it to the same standard I would take Spiral.

Not really. Julia is dynamic and interactive. C++ with its full templating is very powerful but can allow its compile-times to explode because its main use case is not with a REPL (and templated C++ code compile times are...). While Julia's generics and speed have increased the domain of what's possible with a dynamic language, it is still made for a much different domain than C++. I would say it's not really in direct competition with Spiral either since the interactive workflow is just so different. Julia's static binary generation is growing in power but will never be as strong as something made for static compilation, and C++/Spiral won't feel like an on-demand calculator where most users are punching commands and getting instant feedback.

To me, Spiral looks like a cool alternative to C++/Rust/Go in the scientific computing domain, and package ecosystems that mix Spiral+Julia could be quite powerful.

abstractcontrol · on July 15, 2018

> It's still silly to compare Julia's tensors to PyTorch's because they are generic in dimension, layout, and type unlike PyTorch's and these seem to be exactly the pieces that you're ragging on. This means that your beef against PyTorch is fine, but it doesn't necessarily apply to Julia. I am curious if you can go into more detail about Julia's implementation instead of talking about possible Julia problems through an explanation of PyTorch.

Well, no, I can't go into more detail. It is the responsibility of those who take up the challenge I made, so why don't you go into detail instead, in the context of a GPU map kernel? I am genuinely curious about this, but not so much that I actually want to dive deep into the language.

Apart from that, I do like the rest of your reply. It is true that best performance lies somewhere between full inlining and full heap allocation, and that full inlining will kill compile times. One of the points I want to make with Spiral is that starting from the position of full inlining and then adding boxing and join points is a lot easier and much less error prone than starting from the other end and inlining by hand. I feel that this is important other programmers realize along with the fact that sophisticated compiler heuristics for when to inline are the only alternative.

> This sounds like a good idea. We should take this over to our community and see whether it would work well in the language.

I know that Odesky thinks that pattern matching on types directly only really works on languages with simple type systems such as Spiral's. If a language uses abstract types then that becomes more difficult. I definitely see a language having both intensional and extensional (parametric) polymorphism as a significant challenge. I'd love to see it met because extensional polymorphism is a lot easier on the IDEs and the way Spiral does it will make it difficult to create good tooling for it.

If Julia could support intensional polymorphism then inlining guarantees would just fall out from that. It is quite powerful - the inlining guarantees Spiral gives does not just apply to top level functions, but any function passed as an argument and captured by other functions.

> Not really. Julia is dynamic and interactive. C++ with its full templating is very powerful but can allow its compile-times to explode because its main use case is not with a REPL (and templated C++ code compile times are...).

If C++'s templating system was a car, then it would be missing wheels and a steering wheel. And possibly the brakes. How can one possibly do staging without join points, and all the other things Spiral has? As a language it misses several key concepts that would actually make such a system work well.

With the last paragraph I half agree with you.

It is true that right now Spiral uses a compile-run loop and acts as a sophisticated code generator, but it could be made into caching interpreter with some work. It probably will have to be if it gets bigger for the sake of compile times and library support.

Getting instant feedback in the REPL is hardly the domain of dynamic languages anymore - F# can do it well for example and so can various other static functional languages. Even C# can do it these days.

> To me, Spiral looks like a cool alternative to C++/Rust/Go in the scientific computing domain, and package ecosystems that mix Spiral+Julia could be quite powerful.

Julia does a lot of multi-stage compilation. It probably has had a lot of work done on its JIT by now. And language mixing, especially of languages with different strengths and weaknesses is a great idea. Staging gives the user access to it in a more direct manner so why not bring it out?

Spiral is quite a small language so if JITing abilities are flexible enough, it might be a good idea to implement it in Julia as a DSL and merge it with main language. That would combine all the advantages of Julia and Spiral as languages.