Rust should learn from the mistakes of C++ and C, which are one of the longest lasting, biggest impact, widely deployed languages of all time?
It's confusing when people think language standards are bad, and instead of saying this code is C99 or C++11, they like saying "this code works with the Rustc binary / source code with the SHA256 hash e49d560cd008344edf745b8052ef714b07595808898c835f17f962a10012f964".
C and C++ are widely used despite their language, compiler, and build system fragmentation. Each platform/compiler combo needs ifdefs and workarounds that have been done for so long they’re considered a normal thing (or people say MSVC and others don’t count, and C is just GCC+POSIX).
There’s value in multiple implementations ensuring code isn’t bug-compatible, but at the same time in C and C++ there’s plenty of unnecessary differences and unspecified details due to historical reasons, and the narrow scope of their specs.
Yes, that's how the story goes. Languages with specs are widely deployed despite being fragmented and bad, not because people find value in multiple implementations. Must be a coincidence that C, C++, Javascript and C# and Java all fall under this umbrella.
C# has multiple compilers and runtimes? Mono used to be a separate thing but if I recall correctly mono has been adopted by MS & a lot merged between the two.
JavaScript itself is a very simple language with most of the complexity living in disparate runtimes and that there are multiple runtime implementations is a very real problem requiring complex polyfills that age poorly that are maintained by the community. For what it’s worth TypeScript has a single implementation and it’s extremely popular in this community.
Java is probably the “best” here but really there’s still only the Sun & OpenJDK implementations and the OpenJDK and Oracle are basically the same if I recall correctly with the main difference being the inclusion of proprietary “enterprise” components that Oracle can charge money for. There are other implementations of the standard but they’re much more niche (e.g. Azul systems). A point against disparate implementations is how Java on Android is now basically a fork & a different language from modern-day Java (although I believe that’s mostly because of the Oracle lawsuit).
Python is widely deployed & CPython remains the version that most people deploy. Forks find it difficult to keep up with the changes (e.g. PyPy for the longest time lagged quite badly although it seems like they’re doing a better job keeping up these days). The forks have significantly less adoption than CPython though.
It seems unlikely that independent Rust front end implementations will benefit it’s popularity. Having GCC code gen is valuable but integrating that behind the Rust front-end sounds like a better idea and is way further along. gccrs is targeting a 3 year old version of Rust that still isn’t complete while the GCC backend is being used to successfully compile the Linux kernel. My bet is that gccrs will end up closer to gcj because it is difficult to keep up.
Yes. Roslyn, Mono and some Mono-like thing from Unity to compile it into C++.
> Mono used to be a separate thing
Mono is still a thing. The last commit was around 3 months ago.
> multiple runtime implementations is a very real problem requiring complex polyfills
You can target a version of the spec and any implementation that supports that version will run your code. If you go off-spec, that's really on you, and if the implementation has bugs, that's on the implementation.
> TypeScript has a single implementation
esbuild can build typescript code. I use it instead of tsc in my build pipeline, and only use tsc for type-checking.
> [Typescript is] extremely popular in this community
esbuild is extremely popular in the JS/TS community too. The second most-popular TS compiler probably.
> [Java has] only the Sun & OpenJDK implementations
That's not true. There are multiple JDKs and even more JVMs.
> Java on Android is now basically a fork & a different language from modern-day Java
Good thing Java has specs with multiple versions, so you can target a version that is implemented by your target platform and it will run on any implementation that supports that version.
> Python is widely deployed & CPython remains the version that most people deploy.
> The forks have significantly less adoption than CPython though.
That is because Python doesn't have a real spec or standard, at least nothing solid compared to the other languages with specs or standards.
> It seems unlikely that independent Rust front end implementations will benefit it’s popularity.
It seems unlikely that people working on an open-source project will only have the popularity of another open-source project in mind when they spend their time.
> Yes. Roslyn, Mono and some Mono-like thing from Unity to compile it into C++.
Roslyn is more like the next gen compiler and will be included in Mono once it’s ready to replace msc. I view it as closer to polonius because it’s an evolutionary step to upgrade the previous compiler into a new implementation. It’s still a single reference implementation.
> Mono is still a thing
I think you misunderstood my point. It had started as a fork but then Microsoft adopted it by buying Xamarin. It’s not totally clear to me if it’s actually a fork at this point or if it’s merged and shares a lot of code with .NET core. I could be mistaken but Mono and .Net core these days also share quite a bit of code.
> rebuild can build typescript code
Yes, there are plenty of transpilers because the language is easy to desugar into JavaScript (intentionally so - TS stopped accepting any language syntax extensions and follows ES 1:1 now and all the development is in the typing layer). That’s very different from a forked implementation of the type checker which is the real meat of the TS language compiler.
> The second most popular TS compiler probably
It’s a transpiler and not a compiler. If TS had substantial language extensions on top of JS that it was regularly adding, all these forks would be dead in the water.
> That’s not true. There are multiple JDKs and even more JVMs
I meant to say they’re the only ones with any meaningful adoption. All the other JDKs and JVMs are much more niche and often benefit from living in a niche that is often behind on the adoption curve (i.e. still running Java 8 or something or are willing to stay on older Java versions because there’s some key benefit in the other version that is operationally critical).
> Good thing Java has specs with multiple versions, so that you can target a version…
Good for people implementing forks, less good for people living within the ecosystem in terms of having to worry about which version of the compiler to support with their library. For what it’s worth Rust also has language versions but it’s more like an LTS version of the language whereas Java versions come out more frequently & each implementation is on whatever year they wanted to snapshot against.
FYI Mono has been shipping Roslyn as its C# compiler for a few years now. Mono's C# compiler only fully supports up to C# 6 while Roslyn supports C# 12, the latest version.
Mono shares a lot of code with .NET (Core) but is mostly limited to the standard libraries and compiler. Mono is still its own separate implementation of the CLR (runtime/"JVM") and supports much more platforms than .NET (Core) today.
It's probably a matter of time until there's a TypeScript compiler implemented in Rust. But the surface area of the language is pretty big, and I imagine it will always lag behind the official compiler.
> Forks find it difficult to keep up with the changes
That's interesting to think of multiple implementation of a language as "forks" rather than spec-compliant compilers and runtimes. But the problem remains the same, the time and effort necessary to constantly keep up with the reference implementation, the upstream.
There’s been plenty of attempts to implement TS in another language or whatnot. They all struggle with keeping up with the pace of change cause the team behind TS is quite large. There was an effort to do a fairly straight port into Rust which actually turned out quite well, but then the “why” question comes up - the reason would be to do better performance but improving performance requires changing the design which a transliteration approach can’t give you & the more of the design you change, the harder it is to keep up with incoming changes putting you back at square 1. I think Rust rewrites of TS compilers (as long as TS is seeing substantial changes which it has been) will be worse than PyPy which is basically a neat party trick without serious adoption.
Java and C# seem to have actually gotten the idea of multiple implementations correct, in the sense that I have never needed to worry about the specific runtime being used as long as I get my language version correct. I have basically never seen a C/C++ program of more than a few hundred lines that doesn’t include something like #ifdef WIN32 …
Well, there's Android... and Unity, which as I recall is stuck on an old version of C# and its own way of doing things. I also had the interesting experience of working with OSGI at work a couple of years.
Your Android phone and the latest Java share very little commonality. It only recently supports Java 11 which is 5 years old at this point. The other non-OpenJDK implementations you mentioned are much more niche (I imagine the smart cards run JavaCard which is still probably going to be running an OpenJDK offshoot).
Again. I’m not claiming that alternative implementations don’t exist, just that they’re not particularly common/popular compared with OpenJDK/Oracle (which is largely the same codebase). Android is the only alternative implementation with serious adoption and it lags quite heavily.
BTW GraalVM is based on OpenJDK so I don’t really understand your point there. It’s not a ground-up reimplementation of the spec.
GraalVM uses another complete infrastructure, JIT and GC compilers, which affect runtime execution, and existing tooling.
Doesn't matter how popular they are, they exist because there is a business need, and several people are willing to pay for them, in some cases lots of money bags, because they fulfill needs not available in OpenJDK.
I don't think what you're describing here is accurate re: GraalVM. GraalVM is Hotspot but with C1/C2 not being used, using a generic compiler interface to call out to Graal instead AFAIK.
Only if you are describing the plugable OpenJDK interfaces for GraalVM, that have been removed a couple of releases after being introduced, as almost no one used them.
I have been following GraalVM since it started as MaximeVM on Sun Research Labs.
Who said anything about the latest Java. Since Java has versioned specs, platform X can support one version, and you can target it based on that information even if other versions come out and get supported by other platforms.
For example C89 and C99 are pretty old, and modern C has a lot that is different from them. But they still get targeted and deployed, and enjoy a decent following. Because even in 2024, you can write a new C89 compiler and people's existing and new C89 code will compile on it if you implement it right.
But as a developer I do want to use (some of) the latest features as soon as they become available. That's why most of my crates have an N-2 stable version policy.
It is the sort of thing that people aimed for in the 90s, when there was more velocity in c and c++. If rust lives long enough the rate of change will also slow down.
Or do languages get specs because they are widespread and becoming fragmented? That clearly apples to C and C++ - both were implemented first and then a formal spec was written in response to fragmentation.
> It's confusing when people think language standards are bad, and instead of saying this code is C99 or C++11, they like saying "this code works with the Rustc binary / source code with the SHA256 hash e49d560cd008344edf745b8052ef714b07595808898c835f17f962a10012f964".
I don't know if that's totally fair. I remember that it took quite a while for C++ compilers to actually implement all of C++11. So, it was totally normal at the time to change what subset of C++11 we were using to appease whatever version of GCC was in RHEL at the time.
And in the rare instances where you're using in-development features "rust nightly-2023-12-18"
Literally the only reason to specify via a hash* would be if you were using such a bleeding edge feature that it was only merged in the last 48 hours or so and no nightly versions had been cut.
*Or I suppose you don't trust the supply chain, and you either aren't satisfied with or can't create tooling that checks the hash against a lockfile, but then you have the same problem with literally any other compiler for any language.
Indeed. If I'm specifying a hash for anything I'm definitely not leaving things up to the very, very wide range of behaviours covered by the C and C++ standards.
That's besides the point. Adhering to a language standard is much clearer than specifying it by a language compiler's version. Behaviour is documented in the former while one has to observe the output of a binary (and hope that side effects are understood with their full gravity).
But no-one writes code against the standard. We all write code against the reality of the compiler(s) we use. If there's a compiler bug, you either use a different version, a different vendor, or we side step the code triggering the issue. The spec only tells you that the vendor might fix this in the future.
This is definitely not true. Whenever I have a question about a C++ language feature I typically go here first[0], and then if I’m looking for compiler specific info I go to the applicable compiler docs second. Likewise, for Java I go here[1]. For JavaScript I typically reference Mozilla since those docs are usually well written, but they reference the spec where applicable and I dig deeper if needed[2].
Now, none of these links are the specifications for the languages listed, but they all copiously link to the specification where applicable. In rare situations, I have gone directly to the specification. That’s usually if I’m trying to parse a subset of the language or understand an obscure language feature.
I would argue no one writes code against a compiler. Sure we all validate our code with a compiler, but a compiler does not tell you how the language works or interacts with itself. I write my code and look for answers to my questions in the specification for my respective languages, and I suspect most programmers do as well.
If the compiler you use and the spec of your language disagree, what do you do?
The Project is working on a specification. The Foundation hired someone for it.
A Rust spec done purely on paper ahead of time would be the contents of the accepted RFCs. The final implementation almost never matches what was described because during the lengthy implementation and stabilization process we encounter multitude of unknown unknowns. The work of the spec writers will be to go back and document the result of that process.
For what is worth the seeming dragging of feet on this is because the people that would be qualified and inclined to doing that work were working on other, more pressing matters. If we had had a spec back in, let's say, Rust 1.31, what would have that changed, in practice?
> If the compiler you use and the spec of your language disagree, what do you do?
If the compiler claims to follow the specified version of the spec, and it doesn't, you file a compiler bug.
And then use the subset that it supports, perhaps by using an older spec if it supports that fully. Perhaps looking for alternative compilers that have better/full coverage of a spec.
"Supports the spec minus these differences" is still miles better than "any behaviour can change because the Rust 2.1.0 compiler compiles code that the Rust 2.1.0 compiler compiles".
> If the compiler claims to follow the specified version of the spec, and it doesn't, you file a compiler bug.
> And then use the subset that it supports, perhaps by using an older spec if it supports that fully. Perhaps looking for alternative compilers that have better/full coverage of a spec.
If you encounter rustc behavior that seems unintentional, you can always file a bug in the issue tracker against the compiler or language teams[1]. Humans end up making a determination whether the behavior of the compiler is in line with the RFC that introduced the feature.
> "Supports the spec minus these differences" is still miles better than "any behaviour can change because the Rust 2.1.0 compiler compiles code that the Rust 2.1.0 compiler compiles".
You can look at the Rust Reference[2] for guidance on what the language is supposed to be. It is explicitly not a spec[3], but the project is working on one[4].
Can you articulate what a spec for a programming language entails in your mind. I am sure it is a niche opinion/position, but I base any discussions of formal specification of a language on ‘The Definition of ML’. With that as a starting point I don’t see how the implementation process of a compiler would do anything that could force a change to the spec. Once the formal syntax is defined, the static and dynamic semantics of the language laid out and those semantics proved approximately consistent and possessing ‘type safety’ (for some established meaning of the words). Any divergence or disagreement is a failing of the implementation. I’m genuinely interested in what you expect a specification of Rust to be if, as your comment suggests, you have a different point of view?
A specification can be prescriptive (stating how things should work) or descriptive (stating the observable behavior of a specific version of software). For example earlier in Rust's post-1.0 life the borrow checker changed behavior a few times, some to fix soundness bugs, some to enable more correct code to work (NLL). An earlier spec would have described different behavior to what Rust is today (but of course, the spec can be updated over time as well).
How should the Rust Specification represent the type inference algorithm that rustc uses? Is an implementation that can figure out types in more cases than described by the specification conformant? This is the "split ecosystem concern" some have with the introduction of multiple front ends: code that works on gccrs but not rustc. There's no evidence that this will be a problem in practice, everyone involved seems to be aligned on that being a bad idea.
> Any divergence or disagreement is a failing of the implementation.
I think it was probably a rhetorical question, but with regards to type checking, as with ML, type inference is an elaboration step in the compiler that produces the correct syntax of the formal language defined in the language definition. Specifying the actual implemented algorithm that translates from concrete syntax to abstract syntax (and the accompanying elaboration to explicit type annotations) is a separate component of the specification document that does not exert any control or guidance to the definition of the formal language of the ‘language’ in question.
I think this may be a large point of divergence with regard to my understanding and position on language specifications. I assume when posts have said Rust is getting a spec, that that included a formal, ‘mathematically’ proven language definition of the core language. I am aware that is not what C or C++ or JS includes in a spec (and I don’t know if that is true for Ada), but I was operating under the assumption the Rust’s inspiration from Ocaml and limited functional-esque stylings that the language would follow the more formalized definition style I am used to.
But barely anyone gets to write C++11. You write C++11 for MSVC2022, or C++11 that compiles with LLVM 15+ and GCC 8+, and maybe MSVC if you invest a couple hours of effort into it. That's really not that different from saying you require a minimum compiler version of Rust 1.74.0.
C and C++ both learn from the mistakes of others too. Of course as mature languages they have a lot they cannot change. However when they do propose new features it is common to look at what other languages have done. C++'s thread model is better than Java's because they were able to look at the things Java got wrong (in hindsight, those choices looked good at the time to smart people - lets not pick on Java for not predicting how modern hardware would evolve in the future. Indeed it is possible in a few years hardware will evolve differently again and C++'s thread model will then be just as wrong despite me today calling it good)
I think people generally specify their MSRVs as version numbers for libraries, and often a pinned toolchain file for applications. I haven't seen anyone use a hash for this, though I'm sure I might have missed something.
Rust should learn from the mistakes of C++ and C, which are one of the longest lasting, biggest impact, widely deployed languages of all time?
It's confusing when people think language standards are bad, and instead of saying this code is C99 or C++11, they like saying "this code works with the Rustc binary / source code with the SHA256 hash e49d560cd008344edf745b8052ef714b07595808898c835f17f962a10012f964".