> For example, a trivial hello world program in Julia runs ~27x slower than Python’s version and ~187x slower than the one in C.
I don't think it makes any sense to speak of "__x slower" for hello-world. Clearly, this is just a benchmark of startup time, so you only pay it once per program. It should be reported as "__ms slower".
Julia startup (according to this post) takes 371ms. That's 357ms slower than Python, and 369ms slower than C. Faster is always better, but this doesn't seem so bad to me.
For comparison, on my old workstation here, starting a Swift repl takes 2724ms, and starting a Clojure repl takes 4792ms.
Sure... and it means that Swift and Clojure are just as useless as Julia for numerous use cases where start-up times matter, like piping CLI commands together, in Julia's case you're looking at pathetic 3 execs per second.
Moreover, even in some server-side applications it's super neat to have the luxury to spawn a new process to service certain requests, not having to worry about memory leaks. It's a perfect "API" which allows multiple languages to interact together.
It always bothered me when people dismiss the start-up time by adding "just" in front of it. We're not talking about web frameworks, these are _general purpose_ programming languages, and horrendous start-up time automatically disqualifies them from being general purpose and places them into a niche category, in my humble opinion.
> it means that Swift and Clojure are just as useless as Julia for numerous use cases where start-up times matter
Or that different languages require different approaches, and you can't translate 1:1 between technologies. My Mac starts up 27x slower than my C=64, but that doesn't make it useless for any task that requires turning it on.
> We're not talking about web frameworks, these are _general purpose_ programming languages
Are we? I've never written a line of Julia in my life but the impression that I get is that this isn't intended to be a general-purpose language:
> Of course, one might argue that Julia is not intended to be a general-purpose programming language, but a language for numerical computing.
> As has been pointed out elsewhere “Base APIs outside of the niche Julia targets often don’t make sense” and the general-purpose APIs are somewhat limited.
The Julia webpage has 6 tabs listing features, and 5 of them are about numerics. The 6th says "General Purpose", but it's mostly about FFI.
Julia is intended to be (and already is) a general purpose programming language. What it does not try to be is a language that tries to be good for every purpose. Python for example is a general purpose programming language but you wouldn't use it to write kernel modules, and likewise you wouldn't use C to write your simple website (even though you obviously can). A language that tries to be everything to everyone will either be too bloated and complex, or lack the included batteries for pretty much anything.
Even if we don't use technicalities, Julia is a very powerful language that allows people to extend it for purposes it was not built for. If it didn't support JSON, you could write a JSON type as integrated and fast as what the stdlib offers, it's metaprogramming/multiple dispatch paradigm allows for easily creating frameworks for many purposes and while numerical processing gets special attention the language will still outperform in terms of speed most dynamic languages in any domain.
And I do think short scripts are within Julia's main targets and are merely victims of the fact that the language is still too young and the battles the devs chose to fight within that limited time (and either AoT options or an interpreter that runs code while it's compiling could solve it really well for example, but both would require a lot of time and work which could be used for other features).
The poster said "(even though you obviously can)".
People will try all sorts of weird language experiments, so it's no surprise those libraries exist. But do you really expect a significant number of web developers to shift over to C, just because it's technically possible?
I did generalize above, but when I said "simple website" I really meant the usual simple website one the web, not firmware/embedded stuff. I kinda said that because I already wrote one in C including a basic CGI library, as well as a mid-sized one with Rails, and the difference clearly falls into "C doesn't try to be good at website creation, but you can obviously do it" (and in some occasions you just have to). I'd also add that Roller Coaster Tycoon was written in Assembly, but it's not a language that is particularly good for game dev in general (but still a completely general purpose programming language).
Micropython is not really python (which by definition is defined by PEP and CPython implementation), it's a language largely similar to python 3 since you can't just pick a python program or library and run on it at all times, but it's not really a relevant discussion (and it's nice to have multiple variants of a language you like that are good for multiple purposes, which solves the bloat problem as long as you use only one of them at a time).
Regardless, I hope Julia gets to the point that you can target as many places as some of those languages even if it's not nearly the best in each domain (like WASM, embedded, OS stuff, shared libraries).
Actually Assembly is how most 8 and 16 bit titles were written on, higher level languages were the Unity from 80-90's gamedev scene, Roller Coster Tycoon isn't alone.
Quoting the relevant quote "you wouldn't use C to write your simple website", except that is exacly what EE does when on device memory is measured in KB.
C++ would be safer, but the C89 culture reigns in such domains.
Not necessarily. It's entirely possible to keep a daemon running with the loaded runtime to speed up start time. I'm not arguing if and whether these languages are suitable for scripting rather startup time alone need not be a disqualifying factor. If you evaluate holistically and arrive at a language you'll always have one thing or the other that is not as what you want.
A few years ago, as a hobby project, I designed a programming language and wrote an interpreter for it in Java. (I've never released it or shared it with anyone; I was never quite happy with it and eventually moved on to other things.) As you can imagine, an interpreted language with the interpreter written in Java is going to be rather slow to start.
So, I implemented exactly the solution you describe here. I made my interpreter run as a daemon and listen for requests on a Unix domain socket (with a custom binary protocol). I then wrote a client in C, which opened that Unix domain socket and sent it a script to run. The C program also redirected stdin/stdout/stderr to/from the interpreter through the Unix domain socket, if connected to a tty it could switch to/from raw mode, it notified the server if certain signals occurred, etc. Using the interpreter this way, startup time was a lot faster. This approach can be applied to any language which supports runtime evaluation of code.
A long-running background interpreter carries significant security risks, though. A process that lacks most linux capabilities could use it to regain access to things it shouldn't have access to. A process could interpose itself and get access to data and code it shouldn't. If there are multiple users involved, one could bypass ACLs and similar. Essentially, you're removing a significant proportion of the security guarantees the OS provides around isolated processes.
Some of the concerns you raise can be addressed. For example, make the daemon and socket per-user and give the socket 600 permissions. Also, the SCM_CREDENTIALS message (on Linux) or LOCAL_PEERCRED/getpeereid (macOS/*BSD) can be used to validate the caller has expected UID.
The issue with missing-normal-capability processes is more difficult to address. One possibility, at least on Linux, would be to get the pid from SCM_CREDENTIALS message, and then read /proc/$PID/status to check capability bits. It could default to denying access to processes with less capabilities than it itself has.
(SCM_SECURITY can be used to pass the SELinux security label from client to server, which could also be used as a security measure; maybe the server could refuse access to processes running with a different label, or have a whitelist of allowed labels; if SELinux is being used to sandbox a process, that would prevent it from accessing the background interpreter unless that was explicitly allowed by whitelisting.)
It doesn't work very well. I got SEGFAULTs every time I tried; filed a bug report; no resolution. So, while it does exist, it's not a high priority for the Julia team, so I wouldn't rely on it.
I had the exact same experience when I tried over a year ago. The package was eventually closed IIRC, but a recent news letter claimed to have fixed it.
How big are the resulting binaries these days? It’s been some time since I tried this, but I decided that the size of the compiled code didn’t fit my use case (a serverless function IIRC)
Julia's default model is that script functions are only JIT compiled when run (just like the JVM, etc.), but you can just as easily force the compile and stick it in an image.
Have you actually tried the instructions in there? I did, to get a better time for julia in the n-body programming languages shootout https://benchmarksgame-team.pages.debian.net/benchmarksgame/... (I am the current leader among julia implementations). Suffice it to say, it did not work. I mean i'm not a brilliant programmer or anything, but I would hardly say it's "just as easily".
I am a Julia noob and managed to use PackageCompiler.jl for a package I wrote and it just worked out of the box.
I think the only non-stdlib package I used was StaticArrays, but I was able to use the happy path described in the PackageCompiler docs and got it working in an hour or so max. This was probably in March or April, so relatively recent.
Totally, this is why have the right tool for the job approach. I would not start up a Spark cluster for processing 100MB of data I also would not start to write a CLI tool in Clojure (even though it is one of my favorite languages).
I would write a CLI in Clojure if it were a reasonable option. Something like Gambit or Chicken but for Clojure would be a dream. I guess there’s Ferret, so maybe I should give that a try.
You got me thinking about that word, CLI. I know that usually means a Unix shell, but when I just run my bespoke file conversion tool from CIDER REPL, that interface is also a command line one, what else is it?
> Sure... and it means that Swift and Clojure are just as useless as Julia for numerous use cases where start-up times matter, like piping CLI commands together, in Julia's case you're looking at pathetic 3 execs per second.
The REPL is a tad slow but compiled Swift binaries have virtually no startup time overhead.
Eh? Swift surely doesn't fall in this category, as it's meant to be used compiled, not from a REPL, and doesn't have a massive heavy runtime like Clojure. I refuse to believe that a Swift CLI compiled to native code as intended has a startup time problem.
I’ve used it a few times in the past experimentally. It definitely improved startup time. It was hard to set up though, so I generally didn't find it useful.
I’ve become disenchanted with clojure over the last couple years, so I haven’t tried anything recently.
> If you ignore startup time, Julia might have good performance for simple array/matrix operations and loops, but we already know how to make them fast in Python and other languages.
> And it’s not just scripts, Julia’s REPL which should ideally be optimized for responsiveness takes long to start and has noticeable JIT (?) lags. What’s even more worrying is that there doesn’t seem to be much progress there. The REPL was a pain to use a year ago and it still is.
> In addition to that, Julia programs have excessive memory consumption. The above hello world example in Julia uses 18x more memory than Python and 92x more memory than the C version.
> The above hello world example in Julia uses 18x more memory than Python and 92x more memory than the C version.
Again, I'm left wondering if this is proportional, or simply an X megabyte overhead of the runtime. Maybe it's even shared among all running programs, as some runtimes are. Or it looks like maybe stdio could be just particularly bad in Julia, and in practice I've never written a program for any environment where that was my performance bottleneck.
Performance is not easy to measure, and not easy to report. A single number makes a good headline but it's really not enough information.
Which, by the way, only applies to Julia programs under JIT. The AoT compiler will handle nearly 100% of vast the JIT or REPL will and vice versa, with a goal of 100%.
You can't compare starting up a statically-typed, compiled Swift repl to those other dynamic, interpreted languages -- it's doing a lot of things that running a normal, compiled Swift program wouldn't do, and that dynamically-typed Clojure repl doesn't do (even though it's slower).
The Swift repl is essentially a debugger running a compile-run cycle on each entered expression. Some statically-compiled languages have slow compilers but extremely fast runtime execution (Rust, and to a lesser extent C++, come to mind).
Hardware is so complex and there is so much variation in it that this is "obviously" not true.
As a simple example, if two computers are comparable in speed but one has a bigger cpu cache or faster ram, they might run programs with low memory footprint at the same speed, but one will be much faster when running a program that uses a lot of memory.
But people who like the language will invariably use beyond its core competency.
Hence it is important to ensure that julia is "kinda mediocre" for CLI scripting (big step up from "absolutely terrible"). Personally, my familiarity with julia and its features outweighs the fact that python/perl/bash would be the better tool for many CLI scripts.
I think it's disappointing that Julia is not meant for CLI scripting. IMO it has failed to live up to its manifesto (https://julialang.org/blog/2012/02/why-we-created-julia) of being a general-purpose language that can "have it all".
My own bioinformatics work involves parsing massive amounts of text data, and Julia is excellent for this -- it is very fast, yet high-level and easy to write. However, bash pipelines are also a huge part of bioinformatics, and Julia scripts are not very good for this due to the long startup time, which is a shame.
> I think it's disappointing that Julia is not meant for CLI scripting. IMO it has failed to live up to its manifesto (https://julialang.org/blog/2012/02/why-we-created-julia) of being a general-purpose language that can "have it all".
I think it’s still early in the languages development to say that. There isn’t any fundamental reasons Julia couldn’t be tuned to be better at CLI scripting. Currently it’s what, 400ms to startup, which isn’t too terrible but could be made faster. I could see using the Julia debugger as an interpreter for CLI scripts. Or if you run a script a lot it’s possible to have Julia compile an executable. Personally I use it in mainly via notebooks or a repl.
I helped work on getting that number from ~400ms to ~150ms maybe 18 month ago for the v1.0 release. FWIW, a big part of it was shedding a couple excess C libraries with bad load times (usually now lazy loading them when required instead). The next big jump will take more internal effort, but I don’t think there’s any serious showstopper. The bigger short-term interest though has been towards reducing the latency of loading external (user) code/libraries.
For an example of what I mean by the latter, python seems to be pretty fast initially, but then seems to take a huge hit from just trying to get numpy loaded.
$ time python -c '0'
real0m0.024s
user0m0.016s
sys0m0.008s
$ time python -c 'import numpy'
real0m0.165s
user0m1.508s
sys0m2.312s
$ time ./julia -e 0
real0m0.215s
user0m0.240s
sys0m0.144s
I believe you compared first (uncached) startup of Python against second startup of Julia:
$ time python -c '0'
python -c '0' 0,03s user 0,01s system 7% cpu 0,467 total
$ time python -c '0'
python -c '0' 0,03s user 0,00s system 98% cpu 0,030 total
$ time python -c 'import numpy'
python -c 'import numpy' 0,17s user 0,05s system 9% cpu 2,401 total
$ time python -c 'import numpy'
python -c 'import numpy' 0,11s user 0,01s system 99% cpu 0,118 total
$ time julia -e 0
julia -e 0 0,25s user 0,27s system 17% cpu 2,868 total
$ time julia -e 0
julia -e 0 0,09s user 0,05s system 92% cpu 0,155 total
The impact of actually calling something from numpy is also negligible in Python but not in Julia:
$ time python -c 'import numpy; numpy.random.rand(10,10)'
python -c 'import numpy; numpy.random.rand(10,10)' 0,10s user 0,01s system 99% cpu 0,116 total
$ time julia -e 'rand(10,10)'
julia -e 'rand(10,10)' 0,35s user 0,23s system 209% cpu 0,277 total
$ time julia -e 'rand(10,10)'
julia -e 'rand(10,10)' 0,36s user 0,22s system 209% cpu 0,278 total
We got approximately the same numbers (your clock speed is likely to be much higher). User/system time is quasi-bogus, since it’s a high core count system (although still a bit concerning). I accounted for possible cache effect by running each a number of times and reporting the last. I’m not really trying to make an absolute time comparison here, just pointing out that if 100ms is unacceptable, numpy would just miss that bar too. Once you’re past the bar of “this needs to be kept running”, I don’t think a constant factor of 100ms vs 1s makes much difference in QoL, and now we’re just comparing apples and oranges. A constant factor gain on the rest of the time can make a huge difference on the rate of results per second. But I actually hope both will improve!
Ah. I was trained by zsh's time reporting to focus on the last figure and noticed that "real" is at the top in your comment only after posting mine. And then still left scratching my head looking at "user" and "system" times an order of magnitude higher than "real".
Excellent! I was mainly going off the numbers given by another comment. Well ~150ms seems within useable CLI times for me. One note, I’d reckon to have a fair comparison to the Numpy timings the Julia example would need to use an array type to do something.
By "slow startup time" I also include the time to import libraries which is really the bigger problem. For example loading the DataFrames library takes quite a long time, and I've heard the libraries for plotting take even longer to import.
That’s totally unrelated to startup or compilation time. It’s due to a bug in the crufty old Windows CMD.exe terminal—if you use a real terminal it doesn’t happen.
I’m now actually curious - what’s the bug in cmd.exe and why doesn’t that happen with other REPLs? I’m well aware that cmd.exe has its fair share of weird bugs, but on Windows it’s sometimes the only thing available. Plus, every third party terminal basically has to interface with the Windows GUI command-line program via screen-scraping, and so they inherit many of the same bugs. (The Windows Console architecture is fascinating - see https://devblogs.microsoft.com/commandline/windows-command-l... for a rundown!)
If we knew that it would be fixed by now. Julia's REPL only supports VT100 terminals and its descendants. My guess is that CMD.exe on older Windowses fails to emulate VT100 correctly somehow.
I want to know what business anything running on a modern computer has taking that much time to start up. Aside from the fact that 5 seconds is an insane amount of time, this is before any user code has been loaded, so presumably your Clojure repl is executing the same 5 seconds of work every time it starts. Can't that be cached?
Developing tools for devs suck as they will use "hello world" as benchmark. And the tech stack will be decided by the non technical founder. Devs are more concerned about what tools others use rather then the pro et contra. /rant
I haven't understood it as benchmark, rather a smell "they did not bother to optimize this?". The computers are so fast today, that instant startup time is low hanging fruit for a language.
But it's an entirely 100% meaningless point of discussion though.
My peers and I use Julia to run numerical heavy code where the large majority of the time is spent inverting a large matrix to solve the equation Ax = b for x. It's a lot more complicated than this, but essentially it can be boiled down to this main equation. When my code takes hours, pushing onto days, to run, why should I care about 300 ms of startup time at all?
I stopped reading the article because their first point (the hello world benchmark) is entirely meaningless and is just noise in the wind. Their second point is a matter of personal opinion. After these two I didn't bother reading the rest.
I think a lot of people that set out to design languages don't start out with the top three design goals as, fast compile times, fast startup, and robust hooks for tooling. The problem is if you don't start out with those as primary design goals you're going to be totally hosed once they become a problem later.
I think the language benchmarks game or the techempower benchmarks are fairly widely known and are a step or two above comparing hello world performance.
This article is from 2016. The OP probably should indicate that in the title. As far as I have skimmed read the article, some of the claims the authors make have already been resolved in Julia, and I haven't used the language yet, but I will soon in the next couple of weeks.
Indeed, and while there is plenty to still be worried about, a lot has been addressed, too. Further we are now starting to see the pay-off with amazing algorithmic innovations being implemented in Julia.
We have working code to code automatic differentiation:
Like other people, the startup isn't much of a problem to me since I can easily adapt (and even prefer) a REPL based development and for the stuff I use Julia that one time cost is more than worth it, but the problem of the initial JIT lag is that it affects the first impression of the language, which probably has a significant effect in adoption and good word of mouth from people who didn't stay enough to appreciate the compromises being done.
Julia language development seems to be guided towards avoiding making damaging mistakes on the long run (for example focusing early on having a stable programmable multi-stage compilation, very efficient union-types and state of the art multithreading so the library ecosystem can grow with the correct assumptions). But the "time to first plot" is something that can be solved (regardless of it being hard or easy) at any time "with no consequences", since libraries written with the expectation of a slow startup will work just as well when that problem is solved. Though the compiler team did acknowledge the issue and are prioritizing it post 1.3.
I do wonder if focusing on doing the very simple things well first (like python does right now), would be a better strategy, or it would just make people ignore the language because the simple things are already covered everywhere and not interesting enough for bringing people (compared to the amazing stuff Julia can do with supercomputers/clusters and state of the art stuff like Zygote).
>I do wonder if focusing on doing the very simple things well first (like python does right now), would be a better strategy, or it would just make people ignore the language because the simple things are already covered everywhere and not interesting enough for bringing people (compared to the amazing stuff Julia can do with supercomputers/clusters and state of the art stuff like Zygote).
This is just the difficulty of open source. I am a mathematician by trade, so I work on DifferentialEquations.jl, Zygote.jl, new forward-mode AD, and showcase this stuff in neural differential equations as part of my work. While it would definitely be nice if someone focused on startup speeds, it's not going to be the DiffEq/Zygote people, because it's not the stuff that I/we know. Julia has tons of great scientists and mathematicians, so the libraries in that area are pretty fantastic already, but we do need to find some people who know how to write apps and do devops. I plan to actually find and hire some devops people to help out Julia here. Julia has shown from its libraries that it's worth supporting, so now we should tie a bow around it for less hardcore folks.
That would be great if Julia can break into this area. I would really love to be able to easily write a Julia program that reads realtime input from Kafka (possibly a more mature RDKafka.jl, with multithreaded consumers), merges with data from postgres and redshift sources, sends for distributed processing using JuliaDB, which I can then integrate with Flux/Zygote models, all the while sending all metrics for Prometheus. All that with a custom admin API to access the data and manage the models and being easy to deploy in a cloud service.
,,Text formatting and unit testing are two areas that should be relevant to almost any project''
This is just blatantly false.
I love Julia for statistical analysis.
I never needed speed when printing out stuff.
REPL should be much faster (especially importing libraries), and I'm not a fan of 1 based indexing, but everything else is awesome if you need great performance.
Also I'm missing a great Julia native graphing library with zooming support (Matlab's is far better).
Production binaries are very hard to make, but for statistical research I don't know any better software.
Also for unit testing asserts are good enough for me.
> REPL should be much faster (especially importing libraries), and I'm not a fan of 1 based indexing, but everything else is awesome if you need great performance.
Then don't you want fast printing / string ops? REPLs heavily rely on that.
The implicit assumption in much of the Julia world is that the work being done on the dataset will dwarf the startup time; if your workload takes less than 100ms, it’s probably not worth spending tens of milliseconds on compiling your code in the first place, and you should run it with an interpreter. (Incidentally Julia does support that, but it requires a command-line flag, and can come with a heavy performance penalty if you are running numérica-heavy code)
R is a hybrid of a very slow language and statistical algorithms written in C that can't be debugged. Also the libraries don't have name spaces, not strongly typed, and there are many other features and libraries that show their age.
My favourite examples of Julia's great type system are its automatic differentation libraries that take a native function and wouldn't be possible with R.
Lots of the complaints in the article seem a bit contrived or not too relevant. To mention a few:
Performance: It's been mentioned by others that JIT and Julia is a thing. An important point that many here seem to be unaware of is that you can compile your Julia code [0], and then end up with a fast Hello World, if that kind of thing floats your boat.
Language style: This is more or less a subjective thing, right? However, I've found that optimising for JIT forces me into writing short and pure functions. I.e., optimising my Julia code for speed also forces me to write clean code. This is a really nice byproduct of the language design.
Libraries: The complaints seem somewhat thin: there is no mention of the type of unit tests that the author is missing, just a complaint that the library is less featured than some in C++ or Java. Is comparing a pre-1.0 language's unit testing libraries to those of C++ and Java a fair thing anyway? Finding that the generated instructions of a print statement are too long for your liking also does not seem to me to be a fair criticism of the language libraries.
Development: Complaining about the codebase being a mishmash seems unfair to me as well. I find myself browsing source code in Julia much more than other languages. With Julia I can just dive in and generally find that what's relevant to me underneath is also Julia.
These are some strange complaints. As a C++ developer optimizing numerical code, I have never once worried about any of the things mentioned here. I care about instruction folding, vectorization, loop reordering, and efficient instruction emission. If your optimized program is worried about startup latency of 100s of ms, you are doing something very very wrong.
You don't really need the `@simd` macro, though. The compiler is pretty good at vectorizing on its own now.
From the docs: "In many cases, Julia is able to automatically vectorize inner for loops without the use of `@simd`. Using `@simd` gives the compiler a little extra leeway to make it possible in more situations."
> One-based indexing is another questionable design decision. While it may be convenient in some cases, it adds a source of mistakes and extra work when interoperating with popular programming languages that all (surprise!) use 0-based indexing
The benefit is that it's more familiar to scientists who have experience with Mathematica, MATLAB, R, or Fortran. It's fair to compare the pros and cons of this choice, but you have to at least mention the pros.
The revulsion a lot folks express at one based indexing is always bizarre to me. I write code in c, python and matlab. Switching between these is really not that hard. The two indexing models just seem to be convenient/painful for different things.
And yes, perhaps one based indexing introduces a class of bugs when you need to call into c. But zero based indexing has the same problem if you need to call into fortran, and calling fortran is really common for numerical code.
I agree and I would have been a shrill critic of any 1-based indexing.
Most of the classic linear algebra algorithms seem to be described using 1-based indexing and the last time I needed to use one of these algorithms, I stubbornly tried to translate everything into zero-based indexing which was more difficult than you'd imagine and it was difficult to have confidence that I had correctly captured the algorithm.
I switched to using a 1-based matrix implementation and everything became trivial. It's not that hard to switch between looping from 0 to n (exclusive) and looping from 1 to n (inclusive).
The issue is whether you want to cater to established conventions (both in mathematics where 1-based indexing is typical, but non-mathematicians will also similarly use 1-based counting), or you want to cater to what is most logical.
I suspect the main reason 1-based seems convenient for some things is because of convention. Note that 0 wasn't even really used in mathematics until hundreds of years after our system of counting years was created.
Since our year counting is 1-based, we end up with odd things like "2019" being the "19th year" of the "21st century" as opposed to "2018" being "year 18" of "century 20". I suspect it's also fairly unintuitive that the current century began in the year "2001" rather than "2000".
Pretty much all languages designed for mathematics use 1-based indexing. Mathematica, R, Matlab, Fortran, etc. Either people have to think that the designers of these languages all made a mistake, or realize that it makes much more sense for mathematical computing to follow mathematical standards.
Is it possible that mathematics got it slightly wrong? The whole concept of 0 is relatively recent. Plenty of mathematics comes from before its inclusion, so presumably the idea of maintaining convention was there for successive mathematicians too.
It's not about right or wrong, it's just that they work for different things but programming languages unlike math or human languages have to pick only one as default. 1-index is good for counting, if I want the first element up to the 6th element, then I pick 1:6 which is more natural than 0:5 (from the 0th to the 5th). 0-index is good for offset, for example I'm born on the first year of my life, but I wasn't born as a 1 year old, but as a "0 year old".
And since pointer arithmetic is based on offset, it wouldn't make sense for C to use anything other than 0-index. But mathematical languages aren't focusing on mapping the hardware in any way, but to map the mathematics which already uses 1-index for vector/matrix indexing. You can see the relation of languages in [1].
If you want to write generic code for arrays in Julia, you shouldn't use direct indexing anyway, but iterators [2] which allows you to use arrays with any offset you want according to your problem, and for stuff that is tricky to do with 1-indexing like circular buffer the base library already provides solutions (such as mod1()).
> 1:6 which is more natural than 0:5 (from the 0th to the 5th)
This is again just begging the question. When you want to refer to the initial element as the "1st", it is due to the established convention of starting to count from 1. The point is that the reasonining for starting from 1 might only be that: conventional, not based on some inherent logic.
You start counting with 1 because 0 is the term created later to indicate an absence of stuff to count. If I have one kid, I start counting by the number one, if I have 0 kids I don't have anything to count.
But then I agree that there is no inherent logic, math is invented and not discovered, and you could define it any way you want. If we all had 8 fingers we would probably use base 8 instead of 10 after all.
Actually we naturally count from 0, because that's the initial value of the counter.
It just so happens that this edge case of 0 things doesn't occur when we actually need to count something. Starting from 1 is kinda like head is a partial function (bad!) in some functional programming languages. Practicality beats purity.
Does it matter if it's wrong? In mathematics it's a pretty standard, if not written, convention that for example the top left corner of a matrix has the position (1, 1) and not (0, 0). If I read an equation and saw an "a3" in it I can safely assume that there exists an a1 and an a2, all three of which are constants of some sort. I can safely assume that there does not exist an a0, because this just isn't the convention. And furthermore, when I do encounter a 0 subscript (e.g, v0), it is implicitly a special value referencing some reference value or the original starting value. This is different than if I were to see a 1 subscript, such as v1. For example, the equations
f = v0 + x
f = v1 + x
Those are the same equations right? Sure, but when I see v1 I'm not really sure what it is or could be, vs if I saw v0 I can assume it may be the initial velocity when I can look up.
I have started to learn Julia recently after some following the news. The addition of a debugger finally decided me to give a try. For the record I have been using Python for scientific programming before numpy existed, also a little Matlab too.
Since my experiments are just some simple implementations of kmeans and bandits epsilon greedy algorithms take what I'm going to say with a grain of salt. Anyway:
I find Julia very interesting. I managed to make kmeans fast with type annotations. If I were to summarize I would say that Julia is a much better cython. I never managed for example to debug cython. Profiling also seems to work in Julia, another thing very hard to do in Python. On the other hand, as in cython, sometimes you don't know if some missing type annotation is slowing your program. It remains to be seen if the "verbosity" of type annotations is going to help or slow its adoption.
>sometimes you don't know if some missing type annotation is slowing your program
The lack of type annotations won't slow down your Julia program, since the compiler will infer them anyway (types are for multiple dispatch, documentation or to assert types, not speed). What will affect it is if the type can be inferred or not. For example, if you have a variable that is sometimes an int, sometimes a float, sometimes a string it will force the compiler to put the checks on runtime, dropping performance to CPython level (although the compiler optimizes small unions, such as Union{Int, Nothing} for a nullable Int). That's what the community calls type-stability, and the first step of profiling a function is usually using the macro @code_warntype to see what the compiler is inferring. See:
Be sure to read the sibling comment from ddragon; it’s not necessary to explicitly write type annotations, that should never increase performance. It only alters when a particular function can be called. The “secret sauce” in Julia is the aggressive type inference that gets run on non-typed code, determining the types (when possible) of your entire program.
The article was written in 2016 (from the URL). Have some of the authors criticisms been addressed in the latest release? I know that startup time for Julia is now much faster (and also faster than my IPython profile from anaconda).
But the main point is that the critique is misguided in its generality. If the author cares about running many small scripts that each take a handful of milliseconds, then julia is just not the right tool for his job. No need to write overly general angry posts like "julia is slow"; instead write "julia has sluggish startup time". This is to some extent unavoidable, since julia has a quite heavy runtime (need to load llvm). For some workloads, bash / python / perl are more appropriate tools.
To give you an example, `$ time julia -e "print(5)"` gives me about 230 ms, compared to python 35 ms.
The language is designed for longer running programs that compute heavy stuff. And it performs very well at its intended use.
JIT overhead/startup time is still comparably large. There is https://github.com/JuliaLang/PackageCompiler.jl that helps reduce this overhead in user libraries. The base library precompiles quite a few methods already, so the performance deficit relative to C and Python on Julia 1.2 is half that quoted in the article, and unchanged by statically compiling.
Personally, I do a lot of computational geometry in Julia and I really don't care so much about these kinds of small overheads since actual computation time is the dominant factor. I imagine if Julia was designed for scripting in Unix environments this would be a bigger deal, but I think most people in the Julia community care more about how to manage several gigabytes of data in RAM/cache and run some analysis quickly, e.g. composable multithreading in 1.3.
The article may be old, but as if 6 months ago (the last time I tried Julia), the complaints about the jit were still valid. Typing something into the repl with a syntax error took tens of seconds to produce an error. Creating an array with 3 elements took over a second. Plotting took forever. It was a very frustrating experience.
As one of the Julia developers; this is quite atypical. We’d like to get a bug report on our GitHub tracker from you if you’re willing to open one. Anecdotally, on my 2018 MacBook Pro, full startup of Julia, compilation and execution of a syntax error, and cleaning everything up, takes about 0.8s. (Measured with “time julia -e ‘foo foo foo’”). That’s not a time to brag about, but it’s an order of magnitude faster than your comment. Your system may be slower about certain things, but tens of seconds is way far out of the distribution of reasonable times.
Creating an array of three numbers is much faster; on my system (subtracting startup time) it’s less than 50ms, and that’s all because of compilation time. After running it once (so as to compile the random number generation and array construction routines) constructing a random array takes ~4ns.
Again, we’d like to see an issue opened in our github tracker to help figure out whats going wrong. Feel free to open one at https://github.com/JuliaLang/julia
Fast startup, high throughput, high productivity: Choose two.
C/C++/Fortran take fast startup and high throughput, python takes fast startup and high productivity, and julia takes high productivity and high throughput.
Julia is suited for long running processes, be it a simulation/calculation that will run millions of times for hours or a long interactive session for example exploring multiple variants of an algorithm (such as using the REPL, Jupyter or a combination of REPL and text editor which is my favorite, thanks to tools like Revise.jl which will automatically compile code as soon as you save the file).
The start will be slow, since the compiler will be aggressively optimizing and compiling everything it can (not unlike if you had just the source code of a C++ program and wanted to run for the first time), but over time pretty much everything will be already compiled to high performance code so the program, if written adequately, will run at similar speed to C, but it can still be modified at runtime like any dynamic language (which it can then compile as well).
Why is this being posted? It was written 3 years ago, has been discussed before on this site, and the language has changed enormously in those 3 years.
I gave up on Julia at 0.6 for these reasons a couple of years ago, before giving it another try at 1.1
Unfortunately, while the startup time has improved, I still find the JIT compiler overhead makes it unsuitable for most shorter scripts - the sort of thing where I would normally use python or R. Taking a few minutes to produce the first plot of the day while julia recompiles the plotting packages dependency tree is just a pain.
Edit:
After a fresh install of 1.1, I found loading the Plotly package takes 4 seconds from a fresh repl, while the first plot with the Plotly.jl backend (plot(rand(5,5),linewidth=2,title="My Plot")) takes 14s, and subsequent plots take less than 1s on my computer. This is a marked improvement on 0.6, which is where my few minutes came from. Couldn't get PyPlot to work.
What plotting library were you using? The "first plot of the day" is a bit slow for me, but certainly it doesn't take minutes. I just opened a fresh repl and "using PyPlot" took a little over 5 seconds, and the first plot took around 1.2 seconds. Plots after that are around 0.001 seconds.
Granted, if you were plotting in a small script, I guess every run would take about 7 seconds to get the plotting functionality.
This is what PackageCompiler.jl aims to do, for whatever plotting etc. packages you use all the time. It seems to work well for some people but is, right now, a bit fiddly to get set up.
that's literally "the plotting problem" with julia. I'm currently not actively using julia very much, but if I were, I would still be upset about it. However, for general purpose scripting that doesn't use plots, it's not a problem at all.
I've been working in the data space with R since 2014 and Python/Pyspark since 2017 and I'd love to switch to Julia, but whenever I look up for "production" examples I don't feel safe enough to start a real, paid project on it.
I have similar feelings as you. In my work people are being pushed to use Julia. However, I feel that most Julia use cases are simple scripts. Most Julia users don't use a debugger or develop tests.
A Julia debugger was introduced recently [1] but its in a very early stage.
To be fair, there were debuggers pre-1.0, but the changes in the language syntax and compiler broke them. That one was just the first one to work with the new architecture that was released on the second half of last year (and since the language is now stable, we can expect it to stay functional and improve over time). There is also another debugger being made with a completely different strategy (embedding the debugger within the compiled code instead of interpreting it):
This article talks about either the insignificant (startup time, syntax), the fairly straightforward to fix (FFI, documentation), or both (printf performance). The title of the page implies that these are intractable problems, and nothing in the content suggests otherwise.
As far as Julia criticisms go, the Dan Luu post felt like it focused more on the right things (https://danluu.com/julialang/, circa late 2014). Of the two, that's the one I'd like to see a follow-up for.
I really hate how printf and sprintf are macros. I have no idea why they would have these in the standard library instead of replacing them with function forms.
I agree with this. I believe the choice of having them as macros rather than functions was made for performance reasons? However, I think they should be in std lib... having to do 'using Printf' everywhere is silly
I once spent a bit of time playing with DataFramesMeta.jl, which is what I was able to find that seemed like Julia's answer to dplyr, and I thought it was awful precisely because everything was macros. With dplyr, any function that takes a dataframe and returns a dataframe can be thrown into the mix. In DataFramesMeta.jl, if I ran a chain of events and needed to look at only the first few rows, instead of throwing Julia's head function into the chain I was building, I had to switch modes and wrap everything in the function. It was aggravating (and yes, this is a very minor thing, but I was only doing very minor work, and as things got more complicated I don't think I was going to miss dplyr's flexibility less).
Julia isn’t yet ideal (but looks like this will be addressed soon) if your code will run in less than 30 seconds in eg Python, but anything taking longer than that would probably benefit from using Julia. So for most machine learning/data science projects Julia may offer a significant advantage over Python or R.
This seems to be a review of Julia as a general-purpose programming language. For example the reviewer points out early that the review is not targeted at people who like Matlab's syntax.
So this reads a little bit like yet another review of a math DSL, written by and for people who don't need or like math DSLs.
This is subjective but, I've been working and prototyping in a LOT of languages in the past years in various domains. I have to admit that Julia was one of the very few that actually got me excited and love writing in it ^_^. As with everything, it has it's pros and cons - you just have to find the right balance that fits your needs.
The "two language problem" in the high performance numerical computation domain, in which you have a high level language that is easy to use as a glue (such as Python) and a low level language for anything that requires performance (such as C++). In Julia a program that looks like python can run at the speed of C (and as a bonus has the metaprogramming abilities of a Lisp minus s-expressions), and with only one high level language for everything you can much more easily create, understand, extend and debug anything.
But if you really want to write something that's fast you won't write it in Python, Julia, or these days - C. You would probably use C++ (in a non-C-ish, non-OOPish way), and in some cases (think DBMSes) you'd craft your own LLVM IR and have it JITed.
Why not just use Julia + a few crafty LLVM calls to get to the same place if you want to go that deep? That's essentially what you'd be doing from C++ there.
I like Julia a lot, but the lack of row indices for the data frames is very annoying. I’m used to pandas and saddle and row indices are essential. Can they express anything just columns cannot? No, but they are a useful idea. I just use the Julia pandas wrapper when coding Julia and it works better than the native one. Also the performance is pretty good to.
Also I don’t like the begin/end block delimiters, but it’s not a huge deal. The Julia people keep repeating that curly brackets are too valuable, but I’m not at all convinced.
In addition, Julia should just give up on dynamic typing and make static typing mandatory. I understand why they allow it, but I still don’t agree.
As for static typing/static analysis, that's something that can be built inside Julia (and several projects have done so). Personally, I would not want Julia to insist on static typing and much power comes from not overspecifying types.
Yeah, because C compilers were always magically fast. /s
C code generation only got where it is nowadays thanks to endless money spent in compiler optimization, taking advantage of UB beyond its original goal and compiler specific language extensions that probably will never be part of ISO C.
Nothing that other compilers cannot get if there is willingness and enough money to throw at improving them.
"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."
-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming
> Nothing that other compilers cannot get if there is willingness and enough money to throw at improving them.
Could you consider switching to another quote? I've read that one at least half a dozen times. I promise I will not annoy people here by picking one from the millions of quotes out there that report that C (or similar) is a good abstraction level to develop efficient and maintainable software.
To put that quote in perspective anyway, the way I'm reading it isn't even that other (managed) languages can meaningfully rival C in its core strengths. It seems to be more about how interest in improving those managed languages faded (at that time) and the state of the art of compiling those languages regressed..
And regarding "endless money spent in C compiler optimization", neither is UB extremely interesting for optimization nor is optimization extremely interesting for already well-written programs. (I have only my own humble experience so can't really back this up. But question, when was the last time you got a 10x difference between -O0 and -O2?).
(I've disagreed with quite a few of your comments recently. Just wanted to let you know that the last thing I did was upvote some of your comments ;-])
No, urban myths are hard to kill. Specially now that we at least two generations that believe that C was the genesis of system programming languages, fast like a thunderbolt since the first compiler got out of the furnace.
I was already hitting on C during the USENET days, back on the glory days of C vs C++ on comp.lang.c, comp.lang.c++, and their moderated variants.
In the case for Julia it's true though, since It's compiled at runtime very similarly to C via LLVM.
In all my years I've never seen any C/C++ code, where Julia couldn't match performance after some tweaking - often with much simpler code.
I don't think it makes any sense to speak of "__x slower" for hello-world. Clearly, this is just a benchmark of startup time, so you only pay it once per program. It should be reported as "__ms slower".
Julia startup (according to this post) takes 371ms. That's 357ms slower than Python, and 369ms slower than C. Faster is always better, but this doesn't seem so bad to me.
For comparison, on my old workstation here, starting a Swift repl takes 2724ms, and starting a Clojure repl takes 4792ms.