The author touched on the value of the original author, and this is something missing from the RiiR projects I've encountered. An open source project is more than just code. It's the community: the original authors and maintainers and their years of experience in that problem space. If you don't have a plan to move the community over to Rust (and the RiiR projects I've seen did not) you're (A) duplicating effort, (B) ignoring the value of engineering experience, and (C) making a major decision for an open source project (rewriting it) without collaboration with the current maintainers. (C) is anti-collaborative, and an extreme measure that should only happen in extreme circumstances (not "just because it might be better").
At my employer, the number one reason (I'm aware of) to open source our internal software is to solicit collaboration from other companies. Take away collaboration, and it's going to be a harder sell to open source anything.
> (C) is anti-collaborative, and an extreme measure that should only happen in extreme circumstances (not "just because it might be better").
I agree with A and B, but C just feels like gate-keeping. Open source license do not require collaboration with the original authors. You can freely fork any opensource codebase and modify it as you wish without asking someones permission.
You can...and certainly cloning a git repo or hitting the "fork" button on github is a normal part of the development workflow. If you're experimenting those changes may be discarded. If you're contributing those changes go back to upstream. But I don't think either of those are what the author means by fork. Other folks have talked about this. [1] [2]
I think the author means: if you're maintaining changes in your fork indefinitely and advertising this fork to others as superior to the original to use or as a canonical place to file issues, you should have a good reason. Communities are valuable; please don't divide them or throw them away lightly. Try working with the original maintainer first.
It may be superior if there's ongoing maintenance and improvements on the original code.
RiiR certainly has it's place though. Ripgrep for me as a user is vastly superior to grep and a large reason for that I see is in the safe optimizations allowed by the borrow checker. They're certainly possible in C, but it wouldn't be maintainable.
Also I'd look at how stable the library is you're wrapping and the size of the code base. It may be lower effort to rewrite than to wrap.
A good reason is that your fork is superior _and_ you have exhausted reasonable efforts at sharing the improvements with upstream. There's some threshold of how superior but it's subjective.
I think the author's point is not that you can't RiiR, it is that one should think twice if the resultant code is not supported. If there is a Rust community around it, great! But if it is just doing it just to RiiR, then (C) holds.
I did not mention anything about github. I am talking about what you can freely do with any opensource software without being made to feel like you are breaking the law or doing something immoral. If you arent comfortable with people building on top of or making derivatives of your work, do not opensource it.
You are free to fork, but that doesn't make it right.
Imagine a well-funded startup trying to make a name for themselves decided to fork the top 10 emerging open source projects, put their company name on them, and then spend millions in PR/marketing so that (A) it seems like they invented them, and (B) they fork the community. Are they free to? Is it right? What will it mean for the original authors and experts who created the software?
Forking for whatever reason is in the spirit of open source. Asking if something is right is nonsense. From an ethical point of view, it follows the license of the project so there is no issue.
If the author of the original source had an issue with the above scenario, they would've chosen a more restrictive license that didn't make it possible.
> From an ethical point of view, it follows the license of the project so there is no issue.
This seems like the opposite of the usual ethical/legal distinction. It’s definitely legal to fork the project and the community, but not necessarily ethical.
I don’t think you can blame authors for not using a license that describes exactly what they are happy with. What constitutes an “ethical” fork is subjective, based on how necessary the fork is, and I don’t think a license could describe that.
Also, consider this analogy. The MIT license doesn’t contain a patent grant. So if some MIT-licensed software uses a patented technique, the software author and patent holder is within their legal rights to sue all forks of their for patent infringement. But this would be unethical, as the MIT license implies that the licensed software is given freely, not that you will sue all users of your software. Would you blame the sued parties for choosing to use MIT-licensed software, the way you are saying you would blame software authors for choosing a license that doesn’t forbid certain forks?
It is gate keeping. Pretty dismayed to see who is peddling it here. Corporate sycophant now I guess.
Really smart computer nerds emotionally wrapped up in change they aren’t interested in. They know what the future needs! To stay just like yesterday!
Gen X has become the old geezers who want you to stay off their lawn.
Oh what? It may not be super popular like Linux? We all have to work on the same code bases? Individual drive and curiosity verboten! Your one person project won’t satisfy the general user base? You should just focus on computing as we see it.
Who cares. Generate whatever syntax you want. They don’t have to organize around the semantics if they don’t want.
This is a good point. And unfortunately, in a long-lived language the network effect means you have almost no hope of moving a whole community over to a new language at once. Just look at what happened (or rather, didn't happen) with Python 2 libraries. And that was a less-old and less-permeated language than C(++).
So maybe instead of "rewrite it in Rust", the answer is "build a new solution in Rust that meets the needs of a community of people who already use Rust".
(1) You have perfectly good software and your only motivation to rewrite is infatuation with how great Rust seems. In this case, yes, that's unproductive. (Infatuation-driven design is poor engineering and is a fairly widespread problem in the software industry.)
(2) You already have other reasons to want to rewrite (design could be improved, code is in a poor state, etc.), and you're deciding between languages. In that case, a rewrite could be productive even if you didn't use Rust.
(3) You have mature software that appears relatively free of obvious, known bugs, but it's written in an unsafe language, and it would be valuable to have the additional confidence that a safe language could provide. Security-sensitive or other critical applications are where this is most likely to make sense.
How about (5) you're trying to learn Rust and you need a project to work on? Irresponsible for production code, but perfectly valid for a weekend project. And in my experience can be a great way to learn the quirks of a particular language!
yeah sometimes that makes sense. if the lib in question in small enough it might be more valuable to port, so avoid having a multi language project, and avoid the effort of the ffi.
You can compile rust in one go. If you rely on native libraries, you need to be able to link them during cargo build and that's painful to set up. Not impossible... just really annoying to manage.
I'm not positive, but I think GP intended to refer to idea that you would need to multiple compilers to compile the project rather than cross-compiling to another platform.
> (2) You already have other reasons to want to rewrite (design could be improved, code is in a poor state, etc.), and you're deciding between languages.
It would be wise to ignore the language distraction and remember that most urges to rewrite are just flat out wrong. The standard and well respected advice about rewrites doesn't go out the window just because of a fad language, if anything they apply more strongly.
Much of what people perceive as cruft is most of the actual value-- the stored knowledge from years of experience using the software in practice. Few "clean" programs are actually complete.
Mature software can have a bunch of hard to fix bugs that you can't motivate anyone to work on any longer. If you want to continue making progress you have to route around the damage somehow.
There is an automated tool - c2rust[1], automating a huge chunk of the translation from C to Rust. Moreover, it has a refactoring tool[2] that is scriptable in Lua. One of the good examples of such conversion (still not finished though), while still producing the same output is a rewrite[3][4] of XeTeX engine to Rust (as part of Tectonic[5] engine). It is able to parse arXiv articles dump and generate valid PDFs still.
There is some concept[6] of converting Java to Rust, but it is far from being as useful as c2rust tool. Would be nice to integrate with c2rust somehow, if someone wants to help.
I wrote a comparison of the quality of this autogenerated C2rust code versus the original sources (WEB code in this case), the last time this was posted: https://news.ycombinator.com/item?id=21176806 (Though it's not a fair comparison as this particular automatic translation itself started from automatically translated C code, for unclear reasons.)
Last time I looked at it, c2rust translated from C to unsafe Rust, right? I'll point out a (neglected) project[1] to (partially) auto convert from C to a safe subset of C++. For example, if you need a png encoder/decoder library written in C++, perhaps the safest one is here [2].
This article is wonderful and I hope as a setup for second article, "How to Rewrite it in Rust"!
> However at best, the temptation to RiiR is unproductive
> A much better alternative is to reuse the original library and just publish a safe interface to it.
Just as models are a lower dimensional representation of a more complex problem, [1] I have to re-iterate that there are no truth(ism)s in software and as in all engineering, there are trade offs. RiiR can and often is a valid choice. The author talks about "introducing bugs", under an engineering approach to a rewrite, more like a language-port, rewrites can _find_ a lot of bugs. One such way is as follows.
1. keep the same interface in the new system, client code should work against either system.
2. have a body of integration tests, capture these from the field OR write a small collection of orthogonal tests, somewhere between unit and integration.
3. use the tooling, bindgen, etc and generate as much of the interface programmatically as possible.
4. iterate on the port, doing differential testing against both systems.
Doing a language port is comparable in work to long term maintenance and refactoring of an existing codebase. As the tooling gets better, RiiR will be more of a smooth oxidization.
If you own the C/C++ that could possibly get rewritten, I'd think one needs to rationalize NOT RiiR. Better tooling (IDE, build), perf tooling is improving, low bug count, increased team velocity, build is improved, etc.
But the biggest reason to RiiR is safety. Integrating a body of C/C++ code into your rust codebase introduces a huge amount of unsafe code, much worse than surrounding your entire rust program with unsafe { }.
At least do what is outlined in the article but ALSO compile the native code into Wasm and run it from within a sandbox.
edit, something like a library for reading (parsing) a file format, should absolutely be RiiR or run from a sandbox. Data parsing and memory corruption vulnerabilities are excellent dance partners.
100% agree. We had a C++ service that made heavy use of libcurl. A particular release of libcurl introduced some memory safety problems that caused frequent segfaults for us. These memory safety bugs were eventually fixed in another release, but it scared us enough that we investigated rewriting the service in Rust.
After successful experimentation with some prototypes, we eventually rewrote the service in Rust, auditing the unsafe code in our dependencies (of which there was very little). No segfaults ever since.
Side benefit: Since Rust networking libraries tend to have strong async support, and since some of our C++ libraries performed synchronous networking operations, we saw a big improvement in performance. The number of threads needed dropped by 5x.
> But the biggest reason to rewrite it in Rust is safety.
This is not a good reason if there are cheaper ways to get safety (at least for existing codebases). Indeed, there are sound static analysis tools (like TrustInSoft) that guarantee no undefined behaviour in C code. Using them is not completely free, and it may even require adding annotations to the code or even changing the code, but it does seem significantly cheaper than a rewrite (in any language). Such sound static analysis tools are already being used for safety-critical systems in C, and I believe they are more popular than a rewrite in Rust in that domain (where there are reasons not to rewrite in Rust other than just cost, though).
Undefined behavior is not the only kind of bug in C programs, and it's far from clear that fixing all the bugs, or even all the undefined behavior, in an existing C library will be less effort than rewriting it. Consider that one of the worst security bugs in history was a result of Kurt Roeckx eliminating an undefined-behavior bug from OpenSSL.
It's pretty much the only kind of bugs that Rust can prevent, too.
> and it's far from clear that fixing all the bugs, or even all the undefined behavior, in an existing C library will be less effort than rewriting it
It's pretty clear to me.
> Consider that one of the worst security bugs in history
I'm not sure what bug you're referring to, but if it's Heartbleed, than that was an undefined behavior bug. Of course, a functional bug can be introduced at any time, including during a rewrite in another language.
No, I'm talking about the Debian OpenSSL bug Luciano Bello discovered. It was a lot worse than Heartbleed. Kurt Roeckx didn't introduce Heartbleed, and Heartbleed wasn't introduced by removing undefined behavior, so there is no plausible reason for you to infer that I was talking about Heartbleed.
As for the cost of rewrites, there's a lot of evidence from software project metrics that the cost of modifying software can easily exceed the cost of rewriting it; see Glass's Facts and Fallacies of Software Engineering for details and references. Also, though, it should be intuitively apparent (though perhaps nonobvious) that this is a consequence of the undecidability of the Halting Problem and Rice's Theorem — it's impossible to tell what a given piece of software will do, which means that the cost of reproducing its existing behavior in well-understood code is unbounded.
> so there is no plausible reason for you to infer that I was talking about Heartbleed.
Except that Heartbleed is the only OpenSSL bug I've heard of :) Also, I don't know who Kurt Roeckx is.
> there's a lot of evidence from software project metrics that the cost of modifying software can easily exceed the cost of rewriting it
But we're not talking about arbitrary modification, but about, at worst, fixing undefined behavior, which requires only local modifications (or Rust wouldn't be able to prevent that either). As an ultimate reduction, you could choose to rewrite the software in C and still use sound static analysis to show lack of undefined behavior.
> which means that the cost of reproducing its existing behavior in well-understood code is unbounded.
Yes, but that still doesn't mean that a rewrite is cheaper. Also, while your conclusion is correct, your statement of Rice's theorem is inaccurate: it's impossible to always tell what ever piece of software would do. It's certainly possible to tell what some software would do, at least in some cases, or writing software would be impossible to begin with.
I appreciate your clarification! Indeed, I didn't mean it was impossible to tell what any software would do in any situation, only some software (in practice, nearly all) in some situations. The contrary would imply that not only writing software but also running it would be impossible.
Your blog post looks very interesting indeed! I will read it with care.
I do think there's a subtle point about modifying software. Not just any modification of the software that lacks undefined behavior will do; we want a modification that preserves the important aspects of the original software’s behavior. Not only is this easy to get wrong—as shown spectacularly by the OpenSSL bug (which you've presumably looked up by now), but also, for example, by the destruction of the first Ariane 5—but there is no guarantee that it can be done with purely local modifications, even if the final safety property you wanted to establish can be established with chains of local reasoning.
I do agree that sound static analysis of C that is written to make that analysis tractable is just as effective as rewriting in Rust. Not only can such analysis show the absence of undefined behavior, it can show arbitrary correctness properties, including those beyond the reach of Rust’s type system. Probably the strongest example of this kind of analysis is seL4, although now its proofs verify not only the C but also the machine code, thus eliminating the compiler from the TCB.
Yes, I looked up the OpenSSL bug you referred to, and I think it's quite unusual. I'm not sure what those lines were exactly, but from the description it seems like it was intended to read uninitialized memory, something that (safe) Rust won't let you do, either. Also, it's probably wrong even in C, but it worked. So yeah, touching code in any way is not always 100% safe, but my point was just that sound static analysis is still cheaper than a rewrite, as it requires far less modification.
As to seL4, it isn't exactly similar to sound static analysis, as the work was extremely costly. All of seL4 is 1/5 the size of jQuery and it's taken years of work. But it also includes functional verification, not just memory safety. In fact, it is among the largest programs ever functionally verified to that extent, and yet it was about 3 orders of magnitude smaller than oridnary business software, roughly the same verification gap we've had for decades. We don't yet know how to functionally verify software (end-to-end, like seL4) of any size that's not very small.
Anyway, Rust offers a much more limited form of assurance, and sound static analysis tools offer the same, and at a lower cost for existing codebases.
It's not that significant. We can tell what the vast majority existing software will do in an automated way. Compiling a program is the equivalent of encoding it's semantics in another language which implies knowing what it will do - at least that's one way of 'knowing what it will do'.
You can write down the physical laws that apply to a given system, but we don't usually call that "knowing what it will do", unless you can actually predict the state, or in the case of a program, the output. The mere fact that compilers exist is a meaningless form of "knowing what the program will do", only superficially relevant. You can't solve the halting problem with compilers in the same way that Newton's laws don't solve the three-body problem.
Also, did you catch the part where the point is about how expensive it is?
> but we don't usually call that "knowing what it will do", unless you can actually predict the state, or in the case of a program, the output.
Who is 'we'? And yes, we can predict exactly what the output of a given program for a given input is, for the vast majority of cases. All you have to do is run the program.
> The mere fact that compilers exist is a meaningless form of "knowing what the program will do", only superficially relevant.
You think static analysis, type checking, intermediate representation, optimization, the translation of the program with exact semantics into another language, etc. - is 'superficially relevant' to understanding a program?
> You can't solve the halting problem with compilers
Now that's pretty irrelevant.
> Also, did you catch the part where the point is about how expensive it is?
Did you catch the part where I was only commenting on a specific part of the comment? But tell me, how expensive is it?
It would be hard to overstate how incorrect this statement is, if it is read with the implicit qualifier "for all possible inputs", without which my comment above would be obvious nonsense. Of course we can tell what most programs will do for some inputs—we can just run them!
Yup, we can tell what most existing software will do for all inputs. Rice's theorem states that we can't tell what all software will do, not that it's impossible to tell what a given piece of software will do.
The fact that we can determine what a piece of software will do, doesn't mean we always do that kind of analysis, or that the programmer fully understands his own code. That's why we have type systems, constraints, verification tools, etc.
In practice, how much does “not completely free” cost? TrustInSoft doesn’t even publish prices, which pretty strongly suggests I couldn’t afford it nor persuade a manager to expense it.
That's not always true. While I have no idea of what this particular solution costs, I have seen licensing costs that would easily pay for a 20+ person team.
The size of the software is important to consider when you are looking at the cost of rewriting.
While not as mature as Rust, static enforcement of C++'s (memory and data race) safe subset is coming along [1].
Iiuc tools like TrustInSoft are for situations where you need not just safety, but "reliability" (i.e. no crashes, no exceptions). That doesn't really scale so well to larger applications.
TBH I don't see a future where RIIR every piece of software would make sense, and I think you sort of put it in there somewhere in your comments.
But yes I do see that for many it would and for those, RIIR would IMHO be an incremental process of oxidizing your project to the point that there's nothing left but Rust.
Writing something from scratch and waiting for it to finish is the biggest problem we face, cause eventually the will to continue with the effort just dies.
But creating a meaningful ground not just some "safer" bindings does help in inviting others to help share the effort as well.
So hopefully people will be smart and identify what they should do for their projects, should it be a rewrite or should it be a new feature that you write in Rust. :)
And no I don't think bindings are the solution to anything, they serve no purpose in Rust community as a long term measure, you shouldn't have to sacrifice safety and/or performance incase of high level language.
> TBH I don't see a future where RIIR every piece of software would make sense
It doesn't necessarily have to be Rust, and it's a process that may take decades. But I really do think that absolutely everything should be rewritten in a memory-safe language (i.e. not C/C++).
The vast majority of security issues are either stupid misconfigurations or memory-safety issues. And currently most security initiatives are undermined by the fact that they're resting on insecure foundations. Imagine if our computing platforms were truly secure. It would be a revolution, and IMO it's a revolution that's coming.
Exactly, when we talk about RiiR, we are saying, "I would like to have the guarantees that Rust provides". If the Rust ecosystem depends heavily on C/C++ libraries, then the system has the properties of the union of all the flaws and the properties of the language do not translate into a quality of the ecosystem.
That said, I think containing C/C++ to a Wasm sandbox that can be integrated transparently with Rust would be a gigantic win for security, correctness and the Rust ecosystem.
>"I would like to have the guarantees that Rust provides". If the Rust ecosystem depends heavily on C/C++ libraries, then the system has the properties of the union of all the flaws and the properties of the language do not translate into a quality of the ecosystem.
Of course ideally you get rid of all C/C++, but that may not be feasible. Combining Rust with some C/C++ still adds the benefit of Rust safety to the new code you write. The contained C/C++ is like the unsafe section of Rust code, a place where you have to be more vigilant, but you're slowly constraining the space where these issues can arise. And you can iteratively only replace those parts that cause a lot of issues.
TBH I still don't see it happening or being the case atleast in the near foreseeable future...
Why you may ask..? The answer is simple cause there is always a cost.
So the question is what is easier or when can one say the cost of using a "safer" language out weighs a "less safer" one..
Cause otherwise Rust is also too damn unsafe, just move to something completely safe Idris could be a good starting point. /s
Putting the sarcasm aside. Rust is does not guarantee Memory safety, it tries to help with it. (Check Reference Cycles)
The safest feature of Rust would be it's protection against Data Races.
So at the end it's all about the trade-offs and deciding what you are willing to sacrifice and what you aren't...
Over the medium term (years to decades), there is no question that rewriting in a safe language is worth the cost. It's a one-time project, and security issues cost our economy billions on an ongoing basis.
Rust isn't perfect, and perfect automatically-checked safety probably isn't possible, but it's dramatically safer than C/C++, and cuts down the amount that would need to manually audited to the point where it would be feasible to do it comprehensively.
I agree that everything should be made in a memory safe-by-default language like Rust and I really need to see the Rust ecosystem really commit to this. Too many times will I come across a library with some very questionable uses of `unsafe`, which in my opinion has no place in things like HTTP request libraries or web frameworks.
When was the last time a Java or .NET codebase had an RCE from writing past the end of a list? The only RCEs in safe languages generally only happen in uncommon explicit code-loading calls or explicitly unsafe memory access calls.
More than Rust in multithreaded scenarios, actually.
Not only they have automatic memory management, they have an industry standard memory model for hardware access, adopted by C and C++ standards, used as inspiration for std:atomic<> on CUDA.
While you keep being baffled, where is Rust's memory model specification?
There's a good argument to be made that Rust is safer in multithreaded scenarios than the JVM or .NET.
Rust statically prevents inappropriate unsynchronized accesses for arbitrary APIs (for instance, the compiler will emit an error when attempting to mutate a non-concurrent object/data structure from multiple threads). Those VMs make unsynchronized mutations of individual memory locations work just enough to not be unsafe, but still allow arbitrary unsynchronized operations. These will likely end up with an incorrect result, even if there's no memory unsafety, and may completely violate any internal invariants of the object or data structure. This latter point means one cannot write a correct abstraction that relies on its invariants without explicitly considering threadsafety (it is opt-in safety), whereas Rust has this by default (opt-out safety).
Rust has essentially adopted the C/C++11 concurrency model.
Yes, it is one less failure point to worry about, but in my experience using the respective task libraries in distributed applications, most of the multithreaded access bugs lie on external resource usage, and those Rust does not prevent.
Ownership/affine typing does allow modelling that, within a single program.
For instance, the type representing a handle to the external resource can have limited constructors and limited operations, that will enforce single threaded mutations or access (including things like "this object can only be destroyed by the thread that created it").
Assuming you are accessing the distributed resource from multiple threads instead of multiple distributed processes, which is quite common in distributed architectures.
Borrow checker doesn't help at all with process IPC.
Plus all ML derived languages have good enough type systems to model network states, while enjoying the productivity of automatic memory management.
At least in what concerns C++ there are several ways to try its safety instead of bearing the cost of a full rewrite.
Using standard library types, actually integrating sanitizers into CI, enable bounds checking (even on release builds) and above all avoid C style coding.
Naturally this doesn't work out for third party libraries that one doesn't have control over.
So from a business point of view it boils down how much one is willing to spend re-writing the world vs improving parts of it.
You can write safe code in C++, but it requires you to go way out of your way and exercise constant vigilance, and it's not always obvious when you mess it up. This is the approach we've been trying for 20 years, and it generally hasn't worked well, because people are both flawed and lazy. This is the thing that's fundamentally different about Rust — it provides strong guarantees by default and requires you to specifically call it out when you're doing something potentially unsafe, so to some degree it turns our laziness into a force for good.
C++/CLI is basically a set of language extensions, just like clang and gcc have theirs.
Right now it supports up to C++14, if I am not mistaken.
Many don't seem to realise that CLR started as the next evolution of COM, with the required machinery to support VB, C#, J# and C++, alongside any other language that could fit the same kind of semantics.
I agree, and I see a lot of analogies between this situation and the ObjC -> Swift transition which many code bases have gone through. When there are significant semantic benefits in the new language, a slow war of attrition against the old language is a good move.
Alternatively, you could have C/C++ as a third, discrete category somewhere in the middle of the C <---> C++ spectrum. For people who like namespaces, or overloading arithmetic operators for mathy vector/matrix/etc stuff in order to make the code more readable.
I think this is a clever concept and useful article. However, I have to disagree with the severity of this statement:
> at best, the temptation to RiiR is unproductive (unnecessary duplication of effort)
For tiny, mature libraries, this might be true.
But a rewrite in Rust can also vastly improve maintainability. If a library is under heavy maintenance (and will continue to be), your investment in rewriting it is likely to pay dividends by saving developers -- especially ones who are new to the library -- a lot of time.
Another reason I'd prefer things to be rewritten in Rust rather than using bindings is that calling out to external build systems introduces its own set of problems. I've run into the most problems with C-based dependencies (slower builds, OOM errors, confusion over static vs dynamic linking, etc) than any pure-Rust dependency.
i'm more thinking about libraries like tensorflow, ssh, or libgit2.
Yes you could rewrite them in Rust, but considering these are tools which are used all over the ecosystem and have massive momentum behind them, your time would be better spent building cool things on top of existing work than reinventing the wheel and trying to convert the ecosystem to something which will (initially, at least) be an inferior product.
The original statement was deliberately opinionated and extreme, but I still feel like the pragmatic approach of reusing existing libraries instead of rewriting them is the best one for the short/medium term (jury's still out on the long term costs/effects).
There's no way to write a safe interface for a C library like OpenSSH where critical vulnerabilities to malicious payloads have been found and exploited. All this stuff needs to be replaced if we're ever to to have a trustworthy foundation.
> The first step in interfacing with a native library is to understand how it was originally intended to work.
But reading code is hard, it's easier to project your sense of confusion onto your predecessor than own it yourself, and once you rewrite the code in Rust and start to understand the true complexity that the previous code had to work around, you'll leave for another job (this time with Rust on your resume).
My life as an immigrant developer, therefore cheaper, hired to fix the shit of the elite quitters :D
Sometime I catch them a few weeks before they quit and I get to ask why they developed a low level http server in java interpreting controller code written in javascript that nobody understand but them. "It's much more efficient than using php or nodejs or higher level java" "you benchmarked ?" "I have to go do more knowledge transfer, bye"...
I dont know what I'd do if they were allowed by the company to do Go or Rust :D Guess I'd redo their wheel reinvention in the majority language of the team... increasing the overall constant migration I see again and again...
No, it also happens in professional sports. Star players leave the teams that drafted them to look for better opportunities. It seems to be an inevitability that small market teams draft and develop star players who then quit on the fans for a big contract or a chance to play with other stars in the big city.
That's true but the difference of course with a star player is that the hiring team can examine in detail the past performances of the player. They usually only hire if the player is competent.
The problem is that the elite quitter is usually incompetent. They are capable of learning some new things. But they are incompetent in the sense that they are incapable of seeing that the new thing is often the same as or worse than the old thing. They are incompetent in the sense that they have a low attention span and do not follow up on the things they do so they never learn from their mistakes. They often love complexity etc etc
So the hiring team cannot see these details directly. It's up to them to have competent interviewers to flush this out. But of course they are often in awe of the latest fad themselves and so on it goes.
Sports players have a limited amount of time to exploit their bodies.
Developers have a much longer lifetime of continuous learning, opportunities to get better, and focus on being the best they can be.
But at the end of the day, when a developer or player no longer offers their organization value, they will be cut loose. This is hugely destructive to that person's ability to care for themselves and their families. Is it any wonder that they want to do whatever they can to secure a stable future for themselves?
Don't forget the key ingredient in this recipe: they show a small burst of greatness or give a couple shining glimmers of hope and then capitalize on that small sample size.
If you could rewarded with an incredible salary for sticking at the same company for 10 years and making a great, reliable platform using "boring technologies" then more people would probably do it.
Instead to raise your salary you gotta jump jobs every 2 years
> If you could rewarded with an incredible salary for sticking at the same company for 10 years and making a great, reliable platform using "boring technologies" then more people would probably do it.
> Instead to raise your salary you gotta jump jobs every 2 years
I believe the majority of America stays at their employers. It just seems to be the en vogue style for this group. I've made careers at my past two employers. I like to think I've made out very well by sticking to the same company for 4 years, going on 5, much better off than if i joined a startup working slave labor for much less total compensation only to have my shares diluted once an exit happens.
I've done it all at this point in my career, and sticking to a stable, cash-positive business is the best option at this point, IMO of course. Everyone likes to think they'll be the special 1% to make that big exit but then again our generation (millennials) were raised to believe we were special so it makes sense why people go chasing the dollar.
Your employer takes a little more time to adjust, but in the end, people end up where they should be over time.
As long as companies keep expecting experience in very specific technologies (only at work—400 hours on a hobby codebase don't count for as much as 40 hours writing something at a job for determining competence in a language or framework or whatever, it seems) this will keep happening. It's really, really dumb, but there's little other choice for folks who want to keep their skills (=the names of tools & languages they've used to write something for pay) up to date (=trendy high-paying buzzword compliant).
Doubly true if you're not all-in on one of the major Bigcorp silos (Java, C#).
At a previous job I worked with someone who didn't see a buzzword they didn't like. Since nobody was stopping them, they designed an incredibly convoluted system using all the latest tech. In the end we replaced it with a JVM server and a Postgres database.
I occasionally check their LinkedIn and they're still spending 12-15 months every time at companies, being CTO or Big Data Strategist or whatever.
anecdata but I've only had terrible experiences with companies looking for specific technologies (e.g. a job listing of "$LANGUAGE Developer").
Mostly good experiences when it's a specialist gig that happens to require a specific technology. Like some parts of a stack aren't interchangeable and aren't easy to pick up overnight.
"A much better alternative is to reuse the original library and just publish a safe interface to it."
I think I must reject the premise a bit, unless I misunderstand how Rust works.
If you write a safe wrapper around unsafe code, is it not still the case, that if the unsafe code bombs out, it will take the safe Rust down with it, engulfed in shared flames?
(Unless you spawn unsafe code in its separate process or something elaborate like that.)
(Then again, from a practical standpoint the reasoning may be good. Maybe, even likely, the C library is very good and very battle tested. And very likely, my novice would be re-implementation in Rust would not be very good.)
A lot of the time, the danger with unsafe code is doing a thing like passing a null pointer into a function that can’t handle null pointers, or using something before it is instantiated. So the interface should force those things not to happen (with the type system or a constructor or...). This doesn’t prevent the c code from being buggy, but it makes it much easier to use, since you can ensure that you’re maintaining the invariants.
Also even if you do plan to re-implement the C library in Rust it's still a good idea to do this step (make a safe wrapper around the FFI bindings) first. It lets you set up your Rust test cases, and gives you a step to design what the Rust API will look like. You can proceed to pull bits of the C out and into native Rust, all while whatever you've built on top doesn't have to change.
One but difference of this approach over just using the original library directly is that now you have a type system that can enforce more invariants than it is possible in C and that the library was previously already relying on. If the (now reduced) API surface happens to still expose a bug, it can both be fixed in the library and/or protected against in the API.
Summary:
- Rather than rewriting a C++ library in Rust (difficult and will introduce bugs), wrap it in a Rust interface.
This seems like the cleanest way to move a codebase forward without too much regression work. It reminds me of how TensorFlow 2.0 still contains tf.compat.v1 APIs so devs can write forward-facing code while maintaining existing modules.
You'll need to do this anyway if you want an actual library that can be independently packaged, because that means you have to use the system ABI in the library itself. Rust does not provide a stable ABI.
I don't think I've ever met a developer who didn't want to rewrite their predecessor's code and that includes the one I see in the mirror. Doing it in a different language would definitely be more interesting.
Do we admit that this is a factor in the decision to rewrite vs. reuse?
"I don't think I've ever met a developer who didn't want to rewrite their predecessor's code and that includes the one I see in the mirror"
Allow me to introduce myself ;) I mostly work on new products and rewriting to me just a waste of time. I am interested in creating features that give actual ROI. The only time I rewrite a piece of code is if said piece presents a major problem (bugs, performance etc)
Generating C/C++ bindings in Rust is unfortunately still quite a hassle from my experience: Often C libraries end up with important functions as static inline in header files (these do not get translated automatically by bindgen). If the library uses the macro system to rename functions, these too won't show up as symbols in your library (so you'd have to manually define those names too). Then bindgen often pulls in too many dependencies and I end up manually white/blacklisting what symbols I want/don't want (and track changes through multiple versions). That being said, it's a hard problem and Rust has still one of the better toolings for FFI from what I've seen in other languages.
Where are you getting your CHM files? If you're getting them from untrusted sources — that is, people you wouldn't give your password to — you probably don't want to pass them to a buggy C library. That's just begging to get pwned! The only way this seems like a reasonable idea is if your CHM files are generated by your own software or by something like SafeDocs: https://www.darpa.mil/program/safe-documents
Wrapping your buggy C library in Rust isn't going to give you the kind of security against malicious data that a rewrite in Rust would give you.
The tree only contains configure.in, not the generated configure output, and Makefile.am, not the generated Makefile.in. You do need the entire autotools suite installed for it to build.
Why they published it like this is beyond me, however. The whole point of autotools is not making your users install them (unlike CMake and virtually every other build system for C there is).
"It" in my post referred to the Rust cargo package in question. The package ships configure.in, but not the generated configure. You do need autoconf for configure.in -> configure. Similarly, it ships Makefile.am, but not the generated Makefile.in for use with configure.
Note how I said "The whole point of autotools is not making your users install them".
Same as a npm package is usually js code and doesn't require you to run the typescript compiler, a dist tarball shouldn't require you to run autotools but just contain their output.
No, parent answers the question, "What is the difference between the git tree and a dist tarball?" Which is self-evidently (and unhelpfully) "you don't need autotools installed with the dist tarball". My questions are: Why does this difference exist? Why would you not track everything necessary to build a library in git? How is the dist tarball built differently than just zipping the current git tree?
Autotools is rarely portable. I've been doing a lot of cross compiling recently: the vast majority of autotools projects can't be cross compiled. Sure autotools itself supports it, but something required to make it work didn't get connected up and so you can't do it.
If you want to natively compile on a fairly recent linux with the common standard libraries autotools works well. Even though autotools was written to work around these differences, in most cases the developer didn't hook into autotools detecting that difference and so it doesn't work.
Embedded Linux on arm. Nothing very far out really. Note that I'm careful not to blame autotools, it is the user's fault, there are a couple projects that use autotools and cross compile easily. The vast majority do not.
CMake based projects almost always cross compile easily.
I still have issues with Rust string handling. Half of the time I have no idea how to get strings out of libraries. I don't understand the intentions of the language creators at all. Does anybody have a great intro to Rust for software engineers who come from Python or something similar where high level types are the norm?
All of Rust's complexity for strings comes out of complexity due to UTF-8. Well, a tiny bit comes from the fact that Rust has pointers too, but most of it is UTF-8.
Did you happen to read the book? We cover strings early because this is a common pain point.
Also, if you have something more detailed than "get strings out of libraries", I can give better advice. It's tough to tell what the actual issue is.
Ah, I see! So yeah, this isn't something the book covers, so that would not be helpful.
In this case, the hash ends up being raw bytes. So the question is, what are you trying to do with this hash? Being a bunch of bytes, this may not be valid UTF-8 (and in this case, is not ASCII, let alone UTF-8). One option is to encode to base64:
use sha2::{Sha256, Digest};
use base64;
fn main() {
let mut hasher = Sha256::new();
hasher.input(b"hello world");
let result = hasher.result();
let encoded = base64::encode(&result);
assert_eq!("uU0nuZNNPgilLlLX2n2r+sSE7+N6U4DukIj3rOLvzek=", encoded);
}
But yeah, you can't exactly "get a string" out of a random bag of bytes unless you know how you want that bag of bytes to be represented, encoding wise. That's not exactly a satisfying answer, but such are strings!
This seems to be a conceptual mistake that often happens when coming from high level languages, I have seen similar confusion on stack overflow relating to packet based network protocols.
What many low-level data formats deal with are raw bytes ("byte strings" in some contexts). The output of a hasher is random binary data.
What most humans are accustomed to are some form of encoding that makes it easier to spell out the octets. Hex and base64 are common. But they are not the native representation that the machine deals with.
With almost everything being explicit in rust you need an explicit conversion from bytes to hex, base64, urlencode or some other format.
Those conversion methods (with their contract often expressed as a separate trait) may be provided by the same crate or you may have to pull them in from a different crate.
A hasher does not need to provide this itself because its output types can be enhanced by foreign traits.
This is helpful but in general what is a problem of having strings and trivial functions to convert something (like bytes) to strings? Maybe I am missing the point.
Because converting a type to a string isn't trivial. It needs to be defined for that type.
Rust does have a pair of traits that can be used, Display [1] and Debug [2]. Display must be implemented manually by the author of a type, while Debug can be automatically derived if all the members of a type implement Debug. Like this:
The process of "functions to convert bytes to a string" is called "encoding". The problem is, there's no universal way to do it, so you need to choose one. You need to know what the bytes mean, and then convert them into whatever representation that you need.
It is unfortunate that you’ve been downvoted for asking a question.
Happy to help. Feel free to post questions on users.rust-lang.org as well; people are super happy to elaborate on any question, no matter how big or small.
Rust doesn't have varargs, so formatting functions tend to be macros (allowing to pass in multiple arguments, compile time parsing the format string to allow type checking etc) so they are generally not that trivial.
There are some examples of printing output using formatting strings in the RustCrypto readme:
let hash = Blake2b::digest(b"my message");
println!("Result: {:x}", hash);
If you're looking to get the actual string value rather than just print it out, then take a look at the format macro. It uses the same format strings / traits etc as println, so wherever you see println you could drop in format instead:
let hash = Blake2b::digest(b"my message");
let text = format!("{:x}", hash);
It sounds like you're coming from a language (like C) where a string is just an array of bytes or a language (like Ruby or Javascript) where strings are seriously overloaded.
In Rust a string is a UTF-8 sequence. Typically human readable. A byte is generally represented by the u8 type, and a collection of bytes is either an array or vector (e.g. [u8] or Vec<u8>). To create a human readable form of an object in Rust you'd typically use the Display ({} format specifier) and/or Debug traits ({:?}). A string (e.g. &str or String) is NOT a collection of bytes. There are exceptions however with OsString/OsStr representing something closer to a collection of bytes and CString/CStr representing a bag of bytes.
Your comments seem a bit XY-ish to me. What are you trying to solve by converting things to strings? For debugging or human readable output Array, Vec, and u8 all implement the Debug trait. u8 also implements UpperHex and LowerHex so you can get a hex formatted version as well (e.g. format!("{:02X}", bytes)).
For the (MD5) hash case, as others have pointed out you're looking at a base 64 encoding which you'll often have to do on your own depending on the library you're using.
Now if you're struggling with owned vs borrowed strings that's a whole other matter. You can insert some magic into your functions with generics and trait constraints and into your structures with the Cow type (clone on write).
since we know that Sha256::new() will always produce 'valid' characters for utf8 you can just do
let my_rust_string = str::from_utf8(result).unwarp()
If there is a possibility of an error, (you read it from file, network ...) you can do:
let my_rust_string = match str::from_utf8(result) {
Ok(str) => str,
Err(err) => panic!("Invalid utf8: {}", err),
};
Or you can do lossy conversion (will have that question mark on chars that it cant decode)
let my_rust_string = String::from_utf8_lossy(result);
edit: Ok so i read sha2 api wrong. result() is raw bytes and result_str() is hex encoded string. Above wont be much use for either. And others have already explained what to use.
I just stared programming in rust several months ago, and that was my pain also, since where i live we still have some other encoding in use.
It’s like that because strings are very difficult to get 100% right even in gced languages (Python 2 says hello) and when you take into account ownership and different standards used even in the same OS it all becomes a nightmare. You have to know when you want a reference or an owned object, when you want mutable vs immutable and when you want portable vs OS specific, and god forbid you put a path in a string. This is why rust has multiple string type families, all of which (well, most) Python very conveniently hides, for good reasons.
Thanks, it would be great to cover all these types with common scenarios (like using a hash function, storing something in a database or things that are the most common in programming). If it is a good idea to hide those things then we need a library that does exactly that for Rust. Don't you think?
They’re good reasons for Python but not necessarily for Rust, where you are expected to care about such details by design. It’s one of the language’s differentiators.
Nonetheless a cookbook for converting strings for different use cases isn’t a bad idea at all.
In another language you'd risk having a pointer to invalid memory when the function returns if the pointer escapes the function, and so it is advised to use pointers to objects on the stack sparingly if at all.
In rust, the compiler will statically determine if a pointer to an object on the stack could escape the function, and if so will fail to compile.
>so it is advised to use pointers to objects on the stack sparingly
At least in the C/C++ world, this is not true. Where possible, we prefer stack allocated objects. Stack allocation is fast and usually gives you better cache performance than heap allocation.
Rust's lifetime/borrow checker provides compiler support for well-established best practices amongst professional C and C++ users.
looks like you are arguing what "sparingly" means in that context because the rest of the sentence is totally valid.
Stack and Heap allocation in C serve different purposes and there's really no rule of thumb to choose one and there's not even one case more frequent than the other.
Allocate objects in the stack, it's faster. Unless you need a reference in a function in another context. Or it's too big. Or you're going going to share it between threads. Or whatever.
> In another language you'd risk having a pointer to invalid memory when the function returns if the pointer escapes the function, and so it is advised to use pointers to objects on the stack sparingly if at all.
Just a nitpick, this is only true for non-garbage-collected (or refcounted) languages. :-)
A garbage-collected language wouldn't let you allocate such items on the stack in the first place.
GC languages tend not to let you choose where to allocate in the first place, and have obligatory heap semantics for reference types. Some compilers (including Go and Java, to my knowledge) will attempt to optimize the implementation to stack allocation when possible using escape analysis, but this is less precise, and more opaque, then Rust's allocation. And you won't get feedback if it stops working.
So, by GP's comment of "unheard of (or just plain dangerous)", the "unheard-of" applies to GC languages and "plain dangerous" would apply non-GC languages.
I prefer the term "obligate-GC languages": you don't get to choose whether GC runs. Otherwise, Rust and C++ count as "GC-enabled", and comparisons are vacuous.
Reference-counting, a form of GC very commonly used in Rust and C++, is inefficient compared to more intrusive schemes, particularly where there may be contention for the count, but may be applied selectively, e.g. never on critical paths, so that the inefficiency has zero impact on overall system performance.
Obligate-GC advocates like to point at custom benchmarks showing overhead at a small percentage. They are invariably lying, by reporting only the time that the profiler says the program counter is pointing to GC code, and hiding the much-larger results of loss of cache locality on the system as a whole. Typically they don't even know they are lying, which should not give one much confidence in their engineering judgment.
You mean custom benchmarks like Midori powering Bing for Asian requests, the ixy paper, Android GPGPU debugger, Fuchsia TCP/IP stack, ChromeOS and Google Cloud sandboxes.
They admit it wasn't quite full-GC stuff. It was close to the C++. It did use the language safety and parallelism to its advantage on top of the Midori OS. They ended up improving performance over the original.
Still worth mentioning even if not fully-GC. I mean, they could've always used a real-time GC if that was important. There's already commercial and academic ones. For some reason, these projects never try to do that. I think even those developing GC'd systems might not know about RT designs.
> A garbage-collected language wouldn't let you allocate such items on the stack in the first place.
I get your point, but I don't think this is entirely true.
When people think of call stacks in the general sense, there's a major distinction between CPU stacks and managed stacks.
For example in both Java and Go, stack allocations are determined at compile time, and heap allocations at runtime. (Obv, stacks here are not actual CPU stacks, just managed memory stacks depending on the implementation.)
You omitted an important part: the pointer pointing to the stack. In GC'd or refcounted languages, (almost) everything is on the heap (occasional exceptions being primitive types like integers). This of course leads to worse performance because of an additional dereferencing step and cache misses.
Which garbage-collected languages put automatic variables in an actual stack that gets bumped on function calls even when you pass pointers to the automatic variables ?
In garbage collected languages, are there any objects on the stack? I’m not familiar with GC implementations, but at least theoretically everything is allocated on the heap. Maybe some implementations keep things on the stack as an optimization?
Escape analysis is a common optimization for GCed languages.
e: Also, many languages allow the user create user-defined value types, which are copied rather than referenced. Variables containing these can be stack-allocated.
Unheard of might be overstating it, but in Rust you can do things like pass references to other threads and aggressively parallelise things (without lots of defensive copies) without worrying about it causing bugginess later.
Our job is not to program crap again and again, it's to deliver some value for the high fees we charge. Rewritting a working library to migrate language is really hard to defend...
No it’s not. You didn’t write it the first time, and grow from writing it the first time. Writing and shipping something in a new programming language is a fantastic way to learn both. This has only become contentious as people try to come up for a reason why rewriting things in Rust is bad, when really it is about as benign as trends go. I never heard the same complaints about rewriting things in Go (another language I love, btw) and even JS had less naysayers somehow.
You may question the usefulness of having “x but in Rust” and that is fair. However, people keep answering the reasons why you might do something like this and yet like amnesia the same bad opinions come out again next time.
Rewriting software written using Python, Java, and Ruby in Go is entirely different from rewriting software written using C and C++ in Rust. Go has easily quantifiable advantages over those other runtimes regardless of the original code's correctness.
People rewrite C and C++ software in Go too. In fact, unlike Rust, most Go software has no C dependencies at all. This is precisely the result of rewriting everything in Go.
To be clear I’m not suggesting you should always rewrite something versus just wrap it, not at all. I’m just saying the reasoning for not doing it being “you didn’t write the original so what do you know” is very offensive to me; how are you supposed to learn?! People write NES emulators all the time even though they already exist. While most are purely for practice it doesn’t really matter: they wanted to write it, they wrote it, they learned. Discouraging people from writing something that they want to write is upsetting to me.
Most of the common C dependencies were implemented in Go before its public release - https://golang.org/doc/go1. These implementors are exactly the "better-equipped" original authors to which the blog post was referring.
Most programmers are programmers for hire, and it would be a waste of everyone's time to re-implement (for example) libdispatch in Rust. If someone wants to write it on their own time for their own edification and no other purpose, fine -- but half-assed internal implementations of common functionality are a significant drag on the lives of other working programmers.
All of this has absolutely, truly, nothing to do with rewriting something in Rust. You could also rewrite the same library again in C and have the exact same conundrum.
Also, people build dependencies on effectively “hobby” projects all the time. Open source in particular works well this way because the dependents have a good reason to contribute back; the users become stakeholders and contributors. The same may not be true of internal software, but again, this has little to do with the concept of rewriting something in Rust.
This is really diverging into a discussion that has nothing to do with the original paragraph I addressed.
The linked RIIR post pretty much described 2016 the behaviour of people shown in this thread. It is hilarious that seeing the arguing for RIIR in this context.
Rewriting in some language is a matter of skill and preference. To even consider a rewrite there should be a rational reason, i.e. it is broken or can't work in the ecosystem.
For fun and learning you can do what you want, of course.
I have rewritten my small-ish image processing library [0] in Rust (from C) and liked it! I mainly wanted to learn, also Rust forced me to improve the API a bit.
The title of the article is “How to not RIIR”, not “How not to RIIR”. The article is about wrapping an API with a Rust interface so the title used here on HN does not fit the content of the article.
It's subtle. The second phrasing ("how not to x") is idiomatic, and means, roughly, "some things that you should not do when you do x", or "an example of how you should not go about doing x."
So, an article titled, say, "how not to paint your house" would be expected to outline things to avoid when painting your house, and "how to not paint your house" could just be "go for a bike ride instead."
Surely "How Not To..." is typically used when you're talking of a bad way to do something, or a way to fail - so this sounds like it's describing a bad way to rewrite something in Rust, which is not the case.
It also isn't the wording used in the original article title.
I don't disagree with you is that the thesis of the article is "don't re-write something in Rust."
What I was trying to say is that I think the title is making a joke; it's saying "how to re-write something in Rust badly", because well, it's not re-writing it in Rust at all.
I thought the title was making a joke, but then I read the article and IMO the actual article title of "How to not RiiR" is much more indicative of it's content than the HN title of "How Not to RiiR"
Oh no, it's not available, maybe you could rewrite your website in rust to handle the load? ;)
...this is tongue-in-cheek comment. I do agree with the author, "rust rewrite" alternative of wrapping c library is way better in many cases for well established or legacy libs. However it depends on the momentum you can bring - as pure rust implementation for widely used areas can benefit from compiler grinding through it. I can see rust written low level libraries used across all high-level languages in the near future.
At my employer, the number one reason (I'm aware of) to open source our internal software is to solicit collaboration from other companies. Take away collaboration, and it's going to be a harder sell to open source anything.