How to not rewrite it in Rust

brendangregg · on Oct 23, 2019

The author touched on the value of the original author, and this is something missing from the RiiR projects I've encountered. An open source project is more than just code. It's the community: the original authors and maintainers and their years of experience in that problem space. If you don't have a plan to move the community over to Rust (and the RiiR projects I've seen did not) you're (A) duplicating effort, (B) ignoring the value of engineering experience, and (C) making a major decision for an open source project (rewriting it) without collaboration with the current maintainers. (C) is anti-collaborative, and an extreme measure that should only happen in extreme circumstances (not "just because it might be better").

At my employer, the number one reason (I'm aware of) to open source our internal software is to solicit collaboration from other companies. Take away collaboration, and it's going to be a harder sell to open source anything.

orange8 · on Oct 23, 2019

> (C) is anti-collaborative, and an extreme measure that should only happen in extreme circumstances (not "just because it might be better").

I agree with A and B, but C just feels like gate-keeping. Open source license do not require collaboration with the original authors. You can freely fork any opensource codebase and modify it as you wish without asking someones permission.

scottlamb · on Oct 23, 2019

You can...and certainly cloning a git repo or hitting the "fork" button on github is a normal part of the development workflow. If you're experimenting those changes may be discarded. If you're contributing those changes go back to upstream. But I don't think either of those are what the author means by fork. Other folks have talked about this. [1] [2]

I think the author means: if you're maintaining changes in your fork indefinitely and advertising this fork to others as superior to the original to use or as a canonical place to file issues, you should have a good reason. Communities are valuable; please don't divide them or throw them away lightly. Try working with the original maintainer first.

[1] https://news.ycombinator.com/item?id=16600219

[2] https://news.ycombinator.com/item?id=20001570

chii · on Oct 24, 2019

> you should have a good reason

And what constitutes a good reason?

The epitome of open source is that you don't need to explain of your fork is really superior, the users will come.

Aeolun · on Oct 24, 2019

Creating a wrapper around an existing library will never be superior to a native rust library to me.

carlmr · on Oct 24, 2019

It may be superior if there's ongoing maintenance and improvements on the original code.

RiiR certainly has it's place though. Ripgrep for me as a user is vastly superior to grep and a large reason for that I see is in the safe optimizations allowed by the borrow checker. They're certainly possible in C, but it wouldn't be maintainable.

Also I'd look at how stable the library is you're wrapping and the size of the code base. It may be lower effort to rewrite than to wrap.

scottlamb · on Oct 24, 2019

A good reason is that your fork is superior _and_ you have exhausted reasonable efforts at sharing the improvements with upstream. There's some threshold of how superior but it's subjective.

abbadadda · on Oct 24, 2019

I think the author's point is not that you can't RiiR, it is that one should think twice if the resultant code is not supported. If there is a Rust community around it, great! But if it is just doing it just to RiiR, then (C) holds.

orange8 · on Oct 24, 2019

I did not mention anything about github. I am talking about what you can freely do with any opensource software without being made to feel like you are breaking the law or doing something immoral. If you arent comfortable with people building on top of or making derivatives of your work, do not opensource it.

brendangregg · on Oct 24, 2019

You are free to fork, but that doesn't make it right.

Imagine a well-funded startup trying to make a name for themselves decided to fork the top 10 emerging open source projects, put their company name on them, and then spend millions in PR/marketing so that (A) it seems like they invented them, and (B) they fork the community. Are they free to? Is it right? What will it mean for the original authors and experts who created the software?

I happened to be reading this yesterday: dstat is dead. Red Hat decided to rewrite it using pcp, so the original author has given up. https://github.com/dagwieers/dstat/issues/170

Is Red Hat free to? (probably). Is it right? Now that's where I'd want to see a strong reason.

idnefju · on Oct 24, 2019

Forking for whatever reason is in the spirit of open source. Asking if something is right is nonsense. From an ethical point of view, it follows the license of the project so there is no issue.

If the author of the original source had an issue with the above scenario, they would've chosen a more restrictive license that didn't make it possible.

roryokane · on Oct 24, 2019

> From an ethical point of view, it follows the license of the project so there is no issue.

This seems like the opposite of the usual ethical/legal distinction. It’s definitely legal to fork the project and the community, but not necessarily ethical.

I don’t think you can blame authors for not using a license that describes exactly what they are happy with. What constitutes an “ethical” fork is subjective, based on how necessary the fork is, and I don’t think a license could describe that.

Also, consider this analogy. The MIT license doesn’t contain a patent grant. So if some MIT-licensed software uses a patented technique, the software author and patent holder is within their legal rights to sue all forks of their for patent infringement. But this would be unethical, as the MIT license implies that the licensed software is given freely, not that you will sue all users of your software. Would you blame the sued parties for choosing to use MIT-licensed software, the way you are saying you would blame software authors for choosing a license that doesn’t forbid certain forks?

dralley · on Oct 24, 2019

The situation with dstat was a little more subtle than that

https://news.ycombinator.com/item?id=19986646

peoepl929 · on Oct 24, 2019

It is gate keeping. Pretty dismayed to see who is peddling it here. Corporate sycophant now I guess.

Really smart computer nerds emotionally wrapped up in change they aren’t interested in. They know what the future needs! To stay just like yesterday!

Gen X has become the old geezers who want you to stay off their lawn.

Oh what? It may not be super popular like Linux? We all have to work on the same code bases? Individual drive and curiosity verboten! Your one person project won’t satisfy the general user base? You should just focus on computing as we see it.

Who cares. Generate whatever syntax you want. They don’t have to organize around the semantics if they don’t want.

Gimme a break.

_bxg1 · on Oct 23, 2019

This is a good point. And unfortunately, in a long-lived language the network effect means you have almost no hope of moving a whole community over to a new language at once. Just look at what happened (or rather, didn't happen) with Python 2 libraries. And that was a less-old and less-permeated language than C(++).

So maybe instead of "rewrite it in Rust", the answer is "build a new solution in Rust that meets the needs of a community of people who already use Rust".

adrianmonk · on Oct 23, 2019

> at best, the temptation to RiiR is unproductive

I think there are probably three cases here:

(1) You have perfectly good software and your only motivation to rewrite is infatuation with how great Rust seems. In this case, yes, that's unproductive. (Infatuation-driven design is poor engineering and is a fairly widespread problem in the software industry.)

(2) You already have other reasons to want to rewrite (design could be improved, code is in a poor state, etc.), and you're deciding between languages. In that case, a rewrite could be productive even if you didn't use Rust.

(3) You have mature software that appears relatively free of obvious, known bugs, but it's written in an unsafe language, and it would be valuable to have the additional confidence that a safe language could provide. Security-sensitive or other critical applications are where this is most likely to make sense.

Skunkleton · on Oct 23, 2019

(4) You want to develop new components in rust, but it has to integrate with your existing components somehow.

pyython · on Oct 23, 2019

How about (5) you're trying to learn Rust and you need a project to work on? Irresponsible for production code, but perfectly valid for a weekend project. And in my experience can be a great way to learn the quirks of a particular language!

zbentley · on Oct 23, 2019

Why not use ffi for this case? Many languages have robust support for linking to Rust code.

gameswithgo · on Oct 23, 2019

yeah sometimes that makes sense. if the lib in question in small enough it might be more valuable to port, so avoid having a multi language project, and avoid the effort of the ffi.

at least Rust -> C ffi is zero cost (usually)

viraptor · on Oct 24, 2019

Having everything available in Rust code also means you don't have to do lots of extra work for cross-compilation.

saagarjha · on Oct 24, 2019

How does Rust help you if you have to cross-compile?

viraptor · on Oct 24, 2019

You can compile rust in one go. If you rely on native libraries, you need to be able to link them during cargo build and that's painful to set up. Not impossible... just really annoying to manage.

saghm · on Oct 24, 2019

I'm not positive, but I think GP intended to refer to idea that you would need to multiple compilers to compile the project rather than cross-compiling to another platform.

nullc · on Oct 24, 2019

> (2) You already have other reasons to want to rewrite (design could be improved, code is in a poor state, etc.), and you're deciding between languages.

It would be wise to ignore the language distraction and remember that most urges to rewrite are just flat out wrong. The standard and well respected advice about rewrites doesn't go out the window just because of a fad language, if anything they apply more strongly.

Much of what people perceive as cruft is most of the actual value-- the stored knowledge from years of experience using the software in practice. Few "clean" programs are actually complete.

RustyRussell · on Oct 24, 2019

Andrew Tridgell once said "you never need an excuse to rewrite something".

But I usually append that quote with "but then again, you're not Tridge" :)

Kernighan and Plauger said "don't patch bad code: rewrite it". That is definitely true. And there's a lot of bad code.

hinkley · on Oct 23, 2019

Or all of the above.

Mature software can have a bunch of hard to fix bugs that you can't motivate anyone to work on any longer. If you want to continue making progress you have to route around the damage somehow.

xvilka · on Oct 23, 2019

There is an automated tool - c2rust[1], automating a huge chunk of the translation from C to Rust. Moreover, it has a refactoring tool[2] that is scriptable in Lua. One of the good examples of such conversion (still not finished though), while still producing the same output is a rewrite[3][4] of XeTeX engine to Rust (as part of Tectonic[5] engine). It is able to parse arXiv articles dump and generate valid PDFs still.

There is some concept[6] of converting Java to Rust, but it is far from being as useful as c2rust tool. Would be nice to integrate with c2rust somehow, if someone wants to help.

[1] https://github.com/immunant/c2rust

[2] https://c2rust.com/manual/c2rust-refactor/index.html

[3] https://github.com/tectonic-typesetting/tectonic/issues/459

[4] https://github.com/crlf0710/tectonic/tree/oxidize

[5] https://tectonic-typesetting.github.io/en-US/

[6] https://github.com/aschoerk/converter-page

svat · on Oct 23, 2019

I wrote a comparison of the quality of this autogenerated C2rust code versus the original sources (WEB code in this case), the last time this was posted: https://news.ycombinator.com/item?id=21176806 (Though it's not a fair comparison as this particular automatic translation itself started from automatically translated C code, for unclear reasons.)

xvilka · on Oct 23, 2019

Probably because there is no automated translator from WEB? And unlikely there will ever be.

duneroadrunner · on Oct 24, 2019

Last time I looked at it, c2rust translated from C to unsafe Rust, right? I'll point out a (neglected) project[1] to (partially) auto convert from C to a safe subset of C++. For example, if you need a png encoder/decoder library written in C++, perhaps the safest one is here [2].

[1] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...

[2] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...

peatmoss · on Oct 23, 2019

I’ve sort of wondered if there isn’t a possibility for ML driven decompilers that produce idiomatic source code from a given byte code.

sitkack · on Oct 23, 2019

This article is wonderful and I hope as a setup for second article, "How to Rewrite it in Rust"!

> However at best, the temptation to RiiR is unproductive

> A much better alternative is to reuse the original library and just publish a safe interface to it.

Just as models are a lower dimensional representation of a more complex problem, [1] I have to re-iterate that there are no truth(ism)s in software and as in all engineering, there are trade offs. RiiR can and often is a valid choice. The author talks about "introducing bugs", under an engineering approach to a rewrite, more like a language-port, rewrites can _find_ a lot of bugs. One such way is as follows.

1. keep the same interface in the new system, client code should work against either system.

2. have a body of integration tests, capture these from the field OR write a small collection of orthogonal tests, somewhere between unit and integration.

3. use the tooling, bindgen, etc and generate as much of the interface programmatically as possible.

4. iterate on the port, doing differential testing against both systems.

Doing a language port is comparable in work to long term maintenance and refactoring of an existing codebase. As the tooling gets better, RiiR will be more of a smooth oxidization.

If you own the C/C++ that could possibly get rewritten, I'd think one needs to rationalize NOT RiiR. Better tooling (IDE, build), perf tooling is improving, low bug count, increased team velocity, build is improved, etc.

But the biggest reason to RiiR is safety. Integrating a body of C/C++ code into your rust codebase introduces a huge amount of unsafe code, much worse than surrounding your entire rust program with unsafe { }.

At least do what is outlined in the article but ALSO compile the native code into Wasm and run it from within a sandbox.

edit, something like a library for reading (parsing) a file format, should absolutely be RiiR or run from a sandbox. Data parsing and memory corruption vulnerabilities are excellent dance partners.

[1] see PCA https://en.wikipedia.org/wiki/Principal_component_analysis

cliffcrosland · on Oct 23, 2019

> But the biggest reason to RiiR is safety.

100% agree. We had a C++ service that made heavy use of libcurl. A particular release of libcurl introduced some memory safety problems that caused frequent segfaults for us. These memory safety bugs were eventually fixed in another release, but it scared us enough that we investigated rewriting the service in Rust.

After successful experimentation with some prototypes, we eventually rewrote the service in Rust, auditing the unsafe code in our dependencies (of which there was very little). No segfaults ever since.

Side benefit: Since Rust networking libraries tend to have strong async support, and since some of our C++ libraries performed synchronous networking operations, we saw a big improvement in performance. The number of threads needed dropped by 5x.

pron · on Oct 23, 2019

> But the biggest reason to rewrite it in Rust is safety.

This is not a good reason if there are cheaper ways to get safety (at least for existing codebases). Indeed, there are sound static analysis tools (like TrustInSoft) that guarantee no undefined behaviour in C code. Using them is not completely free, and it may even require adding annotations to the code or even changing the code, but it does seem significantly cheaper than a rewrite (in any language). Such sound static analysis tools are already being used for safety-critical systems in C, and I believe they are more popular than a rewrite in Rust in that domain (where there are reasons not to rewrite in Rust other than just cost, though).

kragen · on Oct 23, 2019

Undefined behavior is not the only kind of bug in C programs, and it's far from clear that fixing all the bugs, or even all the undefined behavior, in an existing C library will be less effort than rewriting it. Consider that one of the worst security bugs in history was a result of Kurt Roeckx eliminating an undefined-behavior bug from OpenSSL.

pron · on Oct 23, 2019

It's pretty much the only kind of bugs that Rust can prevent, too.

> and it's far from clear that fixing all the bugs, or even all the undefined behavior, in an existing C library will be less effort than rewriting it

It's pretty clear to me.

> Consider that one of the worst security bugs in history

I'm not sure what bug you're referring to, but if it's Heartbleed, than that was an undefined behavior bug. Of course, a functional bug can be introduced at any time, including during a rewrite in another language.

kragen · on Oct 23, 2019

No, I'm talking about the Debian OpenSSL bug Luciano Bello discovered. It was a lot worse than Heartbleed. Kurt Roeckx didn't introduce Heartbleed, and Heartbleed wasn't introduced by removing undefined behavior, so there is no plausible reason for you to infer that I was talking about Heartbleed.

As for the cost of rewrites, there's a lot of evidence from software project metrics that the cost of modifying software can easily exceed the cost of rewriting it; see Glass's Facts and Fallacies of Software Engineering for details and references. Also, though, it should be intuitively apparent (though perhaps nonobvious) that this is a consequence of the undecidability of the Halting Problem and Rice's Theorem — it's impossible to tell what a given piece of software will do, which means that the cost of reproducing its existing behavior in well-understood code is unbounded.

pron · on Oct 23, 2019

> so there is no plausible reason for you to infer that I was talking about Heartbleed.

Except that Heartbleed is the only OpenSSL bug I've heard of :) Also, I don't know who Kurt Roeckx is.

> there's a lot of evidence from software project metrics that the cost of modifying software can easily exceed the cost of rewriting it

But we're not talking about arbitrary modification, but about, at worst, fixing undefined behavior, which requires only local modifications (or Rust wouldn't be able to prevent that either). As an ultimate reduction, you could choose to rewrite the software in C and still use sound static analysis to show lack of undefined behavior.

> which means that the cost of reproducing its existing behavior in well-understood code is unbounded.

Yes, but that still doesn't mean that a rewrite is cheaper. Also, while your conclusion is correct, your statement of Rice's theorem is inaccurate: it's impossible to always tell what ever piece of software would do. It's certainly possible to tell what some software would do, at least in some cases, or writing software would be impossible to begin with.

BTW, if you're interested in the theory of software correctness, you might be interested in this blog post of mine, that lists relevant results: https://pron.github.io/posts/correctness-and-complexity

kragen · on Oct 24, 2019

I appreciate your clarification! Indeed, I didn't mean it was impossible to tell what any software would do in any situation, only some software (in practice, nearly all) in some situations. The contrary would imply that not only writing software but also running it would be impossible.

Your blog post looks very interesting indeed! I will read it with care.

I do think there's a subtle point about modifying software. Not just any modification of the software that lacks undefined behavior will do; we want a modification that preserves the important aspects of the original software’s behavior. Not only is this easy to get wrong—as shown spectacularly by the OpenSSL bug (which you've presumably looked up by now), but also, for example, by the destruction of the first Ariane 5—but there is no guarantee that it can be done with purely local modifications, even if the final safety property you wanted to establish can be established with chains of local reasoning.

I do agree that sound static analysis of C that is written to make that analysis tractable is just as effective as rewriting in Rust. Not only can such analysis show the absence of undefined behavior, it can show arbitrary correctness properties, including those beyond the reach of Rust’s type system. Probably the strongest example of this kind of analysis is seL4, although now its proofs verify not only the C but also the machine code, thus eliminating the compiler from the TCB.

pron · on Oct 25, 2019

Yes, I looked up the OpenSSL bug you referred to, and I think it's quite unusual. I'm not sure what those lines were exactly, but from the description it seems like it was intended to read uninitialized memory, something that (safe) Rust won't let you do, either. Also, it's probably wrong even in C, but it worked. So yeah, touching code in any way is not always 100% safe, but my point was just that sound static analysis is still cheaper than a rewrite, as it requires far less modification.

As to seL4, it isn't exactly similar to sound static analysis, as the work was extremely costly. All of seL4 is 1/5 the size of jQuery and it's taken years of work. But it also includes functional verification, not just memory safety. In fact, it is among the largest programs ever functionally verified to that extent, and yet it was about 3 orders of magnitude smaller than oridnary business software, roughly the same verification gap we've had for decades. We don't yet know how to functionally verify software (end-to-end, like seL4) of any size that's not very small.

Anyway, Rust offers a much more limited form of assurance, and sound static analysis tools offer the same, and at a lower cost for existing codebases.

jfsszuj · on Oct 24, 2019

We can almost always tell what a given piece of software will do, we just can't tell what all software will do in all cases.

andrewflnr · on Oct 24, 2019

That's a pretty significant almost. Regardless, kragen's point was not about whether figuring it out was possible but how expensive it would be.

jfsszuj · on Oct 24, 2019

It's not that significant. We can tell what the vast majority existing software will do in an automated way. Compiling a program is the equivalent of encoding it's semantics in another language which implies knowing what it will do - at least that's one way of 'knowing what it will do'.

andrewflnr · on Oct 25, 2019

You can write down the physical laws that apply to a given system, but we don't usually call that "knowing what it will do", unless you can actually predict the state, or in the case of a program, the output. The mere fact that compilers exist is a meaningless form of "knowing what the program will do", only superficially relevant. You can't solve the halting problem with compilers in the same way that Newton's laws don't solve the three-body problem.

Also, did you catch the part where the point is about how expensive it is?

jfsszuj · on Oct 26, 2019

> but we don't usually call that "knowing what it will do", unless you can actually predict the state, or in the case of a program, the output.

Who is 'we'? And yes, we can predict exactly what the output of a given program for a given input is, for the vast majority of cases. All you have to do is run the program.

> The mere fact that compilers exist is a meaningless form of "knowing what the program will do", only superficially relevant.

You think static analysis, type checking, intermediate representation, optimization, the translation of the program with exact semantics into another language, etc. - is 'superficially relevant' to understanding a program?

> You can't solve the halting problem with compilers

Now that's pretty irrelevant.

> Also, did you catch the part where the point is about how expensive it is?

Did you catch the part where I was only commenting on a specific part of the comment? But tell me, how expensive is it?

kragen · on Oct 24, 2019

It would be hard to overstate how incorrect this statement is, if it is read with the implicit qualifier "for all possible inputs", without which my comment above would be obvious nonsense. Of course we can tell what most programs will do for some inputs—we can just run them!

jfsszuj · on Oct 24, 2019

Yup, we can tell what most existing software will do for all inputs. Rice's theorem states that we can't tell what all software will do, not that it's impossible to tell what a given piece of software will do.

kragen · on Oct 25, 2019

If this were the case in practice, most software would have no bugs.

jfsszuj · on Oct 25, 2019

The fact that we can determine what a piece of software will do, doesn't mean we always do that kind of analysis, or that the programmer fully understands his own code. That's why we have type systems, constraints, verification tools, etc.

erik_seaberg · on Oct 23, 2019

In practice, how much does “not completely free” cost? TrustInSoft doesn’t even publish prices, which pretty strongly suggests I couldn’t afford it nor persuade a manager to expense it.

pron · on Oct 23, 2019

The main cost either way is not licensing but effort.

chillfox · on Oct 24, 2019

That's not always true. While I have no idea of what this particular solution costs, I have seen licensing costs that would easily pay for a 20+ person team. The size of the software is important to consider when you are looking at the cost of rewriting.

pron · on Oct 27, 2019

Effort is the cost I was referring to.

safercplusplus · on Oct 24, 2019

While not as mature as Rust, static enforcement of C++'s (memory and data race) safe subset is coming along [1].

Iiuc tools like TrustInSoft are for situations where you need not just safety, but "reliability" (i.e. no crashes, no exceptions). That doesn't really scale so well to larger applications.

[1] https://github.com/duneroadrunner/scpptool

minraws · on Oct 23, 2019

I would like to add my 2 cents,

TBH I don't see a future where RIIR every piece of software would make sense, and I think you sort of put it in there somewhere in your comments.

But yes I do see that for many it would and for those, RIIR would IMHO be an incremental process of oxidizing your project to the point that there's nothing left but Rust.

Writing something from scratch and waiting for it to finish is the biggest problem we face, cause eventually the will to continue with the effort just dies. But creating a meaningful ground not just some "safer" bindings does help in inviting others to help share the effort as well.

So hopefully people will be smart and identify what they should do for their projects, should it be a rewrite or should it be a new feature that you write in Rust. :)

And no I don't think bindings are the solution to anything, they serve no purpose in Rust community as a long term measure, you shouldn't have to sacrifice safety and/or performance incase of high level language.

nicoburns · on Oct 23, 2019

> TBH I don't see a future where RIIR every piece of software would make sense

It doesn't necessarily have to be Rust, and it's a process that may take decades. But I really do think that absolutely everything should be rewritten in a memory-safe language (i.e. not C/C++).

The vast majority of security issues are either stupid misconfigurations or memory-safety issues. And currently most security initiatives are undermined by the fact that they're resting on insecure foundations. Imagine if our computing platforms were truly secure. It would be a revolution, and IMO it's a revolution that's coming.

sitkack · on Oct 23, 2019

Exactly, when we talk about RiiR, we are saying, "I would like to have the guarantees that Rust provides". If the Rust ecosystem depends heavily on C/C++ libraries, then the system has the properties of the union of all the flaws and the properties of the language do not translate into a quality of the ecosystem.

That said, I think containing C/C++ to a Wasm sandbox that can be integrated transparently with Rust would be a gigantic win for security, correctness and the Rust ecosystem.

carlmr · on Oct 24, 2019

>"I would like to have the guarantees that Rust provides". If the Rust ecosystem depends heavily on C/C++ libraries, then the system has the properties of the union of all the flaws and the properties of the language do not translate into a quality of the ecosystem.

Of course ideally you get rid of all C/C++, but that may not be feasible. Combining Rust with some C/C++ still adds the benefit of Rust safety to the new code you write. The contained C/C++ is like the unsafe section of Rust code, a place where you have to be more vigilant, but you're slowly constraining the space where these issues can arise. And you can iteratively only replace those parts that cause a lot of issues.

minraws · on Oct 23, 2019

TBH I still don't see it happening or being the case atleast in the near foreseeable future...

Why you may ask..? The answer is simple cause there is always a cost. So the question is what is easier or when can one say the cost of using a "safer" language out weighs a "less safer" one..

Cause otherwise Rust is also too damn unsafe, just move to something completely safe Idris could be a good starting point. /s

Putting the sarcasm aside. Rust is does not guarantee Memory safety, it tries to help with it. (Check Reference Cycles) The safest feature of Rust would be it's protection against Data Races.

So at the end it's all about the trade-offs and deciding what you are willing to sacrifice and what you aren't...

nicoburns · on Oct 24, 2019

Over the medium term (years to decades), there is no question that rewriting in a safe language is worth the cost. It's a one-time project, and security issues cost our economy billions on an ongoing basis.

Rust isn't perfect, and perfect automatically-checked safety probably isn't possible, but it's dramatically safer than C/C++, and cuts down the amount that would need to manually audited to the point where it would be feasible to do it comprehensively.

jturpin · on Oct 23, 2019

I agree that everything should be made in a memory safe-by-default language like Rust and I really need to see the Rust ecosystem really commit to this. Too many times will I come across a library with some very questionable uses of `unsafe`, which in my opinion has no place in things like HTTP request libraries or web frameworks.

pjmlp · on Oct 23, 2019

Yep, on my domain it has been Java and .NET.

minraws · on Oct 23, 2019

Java and .NET are Safe..... O.o

Edit: For the uninitiated, I am baffled by the fact that people think Java and .NET are memory safe.

Also if you meant that Java and .NET should be replaced by something memory safe.. I am absolutely for it. :)

AgentME · on Oct 23, 2019

When was the last time a Java or .NET codebase had an RCE from writing past the end of a list? The only RCEs in safe languages generally only happen in uncommon explicit code-loading calls or explicitly unsafe memory access calls.

pjmlp · on Oct 23, 2019

More than Rust in multithreaded scenarios, actually.

Not only they have automatic memory management, they have an industry standard memory model for hardware access, adopted by C and C++ standards, used as inspiration for std:atomic<> on CUDA.

While you keep being baffled, where is Rust's memory model specification?

dbaupp · on Oct 23, 2019

There's a good argument to be made that Rust is safer in multithreaded scenarios than the JVM or .NET.

Rust statically prevents inappropriate unsynchronized accesses for arbitrary APIs (for instance, the compiler will emit an error when attempting to mutate a non-concurrent object/data structure from multiple threads). Those VMs make unsynchronized mutations of individual memory locations work just enough to not be unsafe, but still allow arbitrary unsynchronized operations. These will likely end up with an incorrect result, even if there's no memory unsafety, and may completely violate any internal invariants of the object or data structure. This latter point means one cannot write a correct abstraction that relies on its invariants without explicitly considering threadsafety (it is opt-in safety), whereas Rust has this by default (opt-out safety).

Rust has essentially adopted the C/C++11 concurrency model.

pjmlp · on Oct 24, 2019

Yes, it is one less failure point to worry about, but in my experience using the respective task libraries in distributed applications, most of the multithreaded access bugs lie on external resource usage, and those Rust does not prevent.

Not saying that it isn't important though.

dbaupp · on Oct 24, 2019

Ownership/affine typing does allow modelling that, within a single program.

For instance, the type representing a handle to the external resource can have limited constructors and limited operations, that will enforce single threaded mutations or access (including things like "this object can only be destroyed by the thread that created it").

pjmlp · on Oct 24, 2019

Assuming you are accessing the distributed resource from multiple threads instead of multiple distributed processes, which is quite common in distributed architectures.

Borrow checker doesn't help at all with process IPC.

Plus all ML derived languages have good enough type systems to model network states, while enjoying the productivity of automatic memory management.

minraws · on Oct 23, 2019

Not sure where I said, Rust is memory safe.... Though.

Just scroll up, I explicitly mentioned how it isn't. I just want to say, Java and C# isn't as safe as most people think.

Define industry standard, cause nothing points me towards industry standards actually being better than not industry standards.

Has everyone forgotten these http://cs.oswego.edu/pipermail/concurrency-interest/2017-Dec...

Nevertheless who knows I might not know anything, and I love being proved wrong when it makes most of the world a safer place (quite literally)... :P

pjmlp · on Oct 23, 2019

At least in what concerns C++ there are several ways to try its safety instead of bearing the cost of a full rewrite.

Using standard library types, actually integrating sanitizers into CI, enable bounds checking (even on release builds) and above all avoid C style coding.

Naturally this doesn't work out for third party libraries that one doesn't have control over.

So from a business point of view it boils down how much one is willing to spend re-writing the world vs improving parts of it.

chc · on Oct 23, 2019

You can write safe code in C++, but it requires you to go way out of your way and exercise constant vigilance, and it's not always obvious when you mess it up. This is the approach we've been trying for 20 years, and it generally hasn't worked well, because people are both flawed and lazy. This is the thing that's fundamentally different about Rust — it provides strong guarantees by default and requires you to specifically call it out when you're doing something potentially unsafe, so to some degree it turns our laziness into a force for good.

pjmlp · on Oct 23, 2019

> Naturally this doesn't work out for third party libraries that one doesn't have control over.

makapuf · on Oct 23, 2019

Also, adress sanitizer but iiuc it's not advised to keep it in production code.

pjmlp · on Oct 23, 2019

Android ships a subset of it enabled in production.

makapuf · on Oct 23, 2019

Interesting, I could not find any links do you have some ?

pjmlp · on Oct 24, 2019

Sure,

https://source.android.com/devices/tech/debug/asan

https://android-developers.googleblog.com/2019/05/queue-hard...

Also FORTIFY has been being enabled across the codebase and will become default going forward.

https://android-developers.googleblog.com/2019/10/introducin...

Future Android devices on ARM will make use of memory tagging.

https://security.googleblog.com/2019/08/adopting-arm-memory-...

The_rationalist · on Oct 23, 2019

BTW there are garbage collectors for c++

pjmlp · on Oct 23, 2019

Yep.

C++/CLI, C++/CX, GC pluggable API introduced in C++11, C++ Builder VCL in ARC mode, Unreal C++ managed classes.

The_rationalist · on Oct 23, 2019

Do you know how much c++/cli is automatically compatible with c# code? And how much does it retain compatibility with standard c++?

pjmlp · on Oct 23, 2019

Everything that obeys to CLS.

https://docs.microsoft.com/en-us/dotnet/standard/language-in...

https://docs.microsoft.com/en-us/cpp/dotnet/managed-types-cp...

Then you can also use regular low level stuff, but in that case you will get mixed mode Assemblies (other forms are now deprecated).

https://docs.microsoft.com/en-us/cpp/dotnet/mixed-native-and...

C++/CLI is basically a set of language extensions, just like clang and gcc have theirs.

Right now it supports up to C++14, if I am not mistaken.

Many don't seem to realise that CLR started as the next evolution of COM, with the required machinery to support VB, C#, J# and C++, alongside any other language that could fit the same kind of semantics.

The_rationalist · on Oct 24, 2019

Thank you, this is very interesting!

cellularmitosis · on Oct 23, 2019

I agree, and I see a lot of analogies between this situation and the ObjC -> Swift transition which many code bases have gone through. When there are significant semantic benefits in the new language, a slow war of attrition against the old language is a good move.

otabdeveloper4 · on Oct 23, 2019

There's no such thing as a "C/C++".

pjmlp · on Oct 23, 2019

Apparently there is when reading ISO, Microsoft, Apple, Google, IBM, Microsoft,... documentation.

Guess what, we even used to have a famous magazine called "The C/C++ Users Journal".

makapuf · on Oct 23, 2019

Agreed. There Are differences between c and "c in c++" but no more than python 2 and 3 or c++11 and c++17

dmit · on Oct 23, 2019

Alternatively, you could have C/C++ as a third, discrete category somewhere in the middle of the C <---> C++ spectrum. For people who like namespaces, or overloading arithmetic operators for mathy vector/matrix/etc stuff in order to make the code more readable.

rytill · on Oct 23, 2019

pnako · on Oct 24, 2019

C+=0.5

rytill · on Oct 28, 2019

keldaris · on Oct 23, 2019

While you can essentially write C in almost any language, C++ is particularly well suited to it.

UncleMeat · on Oct 24, 2019

The abi meaningfully binds them.

smt88 · on Oct 23, 2019

I think this is a clever concept and useful article. However, I have to disagree with the severity of this statement:

> at best, the temptation to RiiR is unproductive (unnecessary duplication of effort)

For tiny, mature libraries, this might be true.

But a rewrite in Rust can also vastly improve maintainability. If a library is under heavy maintenance (and will continue to be), your investment in rewriting it is likely to pay dividends by saving developers -- especially ones who are new to the library -- a lot of time.

epage · on Oct 23, 2019

Another reason I'd prefer things to be rewritten in Rust rather than using bindings is that calling out to external build systems introduces its own set of problems. I've run into the most problems with C-based dependencies (slower builds, OOM errors, confusion over static vs dynamic linking, etc) than any pure-Rust dependency.

MichaelFBryan · on Oct 23, 2019

i'm more thinking about libraries like tensorflow, ssh, or libgit2.

Yes you could rewrite them in Rust, but considering these are tools which are used all over the ecosystem and have massive momentum behind them, your time would be better spent building cool things on top of existing work than reinventing the wheel and trying to convert the ecosystem to something which will (initially, at least) be an inferior product.

The original statement was deliberately opinionated and extreme, but I still feel like the pragmatic approach of reusing existing libraries instead of rewriting them is the best one for the short/medium term (jury's still out on the long term costs/effects).

erik_seaberg · on Oct 23, 2019

There's no way to write a safe interface for a C library like OpenSSH where critical vulnerabilities to malicious payloads have been found and exploited. All this stuff needs to be replaced if we're ever to to have a trustworthy foundation.

asdfman123 · on Oct 23, 2019

> The first step in interfacing with a native library is to understand how it was originally intended to work.

But reading code is hard, it's easier to project your sense of confusion onto your predecessor than own it yourself, and once you rewrite the code in Rust and start to understand the true complexity that the previous code had to work around, you'll leave for another job (this time with Rust on your resume).

xwolfi · on Oct 23, 2019

My life as an immigrant developer, therefore cheaper, hired to fix the shit of the elite quitters :D

Sometime I catch them a few weeks before they quit and I get to ask why they developed a low level http server in java interpreting controller code written in javascript that nobody understand but them. "It's much more efficient than using php or nodejs or higher level java" "you benchmarked ?" "I have to go do more knowledge transfer, bye"...

I dont know what I'd do if they were allowed by the company to do Go or Rust :D Guess I'd redo their wheel reinvention in the majority language of the team... increasing the overall constant migration I see again and again...

skrebbel · on Oct 23, 2019

I really like the phrase "elite quitters".

I wonder if, as a stereotype, it's unique to the software industry.

chongli · on Oct 23, 2019

No, it also happens in professional sports. Star players leave the teams that drafted them to look for better opportunities. It seems to be an inevitability that small market teams draft and develop star players who then quit on the fans for a big contract or a chance to play with other stars in the big city.

discreteevent · on Oct 23, 2019

That's true but the difference of course with a star player is that the hiring team can examine in detail the past performances of the player. They usually only hire if the player is competent.

The problem is that the elite quitter is usually incompetent. They are capable of learning some new things. But they are incompetent in the sense that they are incapable of seeing that the new thing is often the same as or worse than the old thing. They are incompetent in the sense that they have a low attention span and do not follow up on the things they do so they never learn from their mistakes. They often love complexity etc etc

So the hiring team cannot see these details directly. It's up to them to have competent interviewers to flush this out. But of course they are often in awe of the latest fad themselves and so on it goes.

bluejekyll · on Oct 23, 2019

Sports players have a limited amount of time to exploit their bodies.

Developers have a much longer lifetime of continuous learning, opportunities to get better, and focus on being the best they can be.

But at the end of the day, when a developer or player no longer offers their organization value, they will be cut loose. This is hugely destructive to that person's ability to care for themselves and their families. Is it any wonder that they want to do whatever they can to secure a stable future for themselves?

tyri_kai_psomi · on Oct 23, 2019

Don't forget the key ingredient in this recipe: they show a small burst of greatness or give a couple shining glimmers of hope and then capitalize on that small sample size.

fjp · on Oct 23, 2019

I mean the incentives are the cause.

If you could rewarded with an incredible salary for sticking at the same company for 10 years and making a great, reliable platform using "boring technologies" then more people would probably do it.

Instead to raise your salary you gotta jump jobs every 2 years

tyri_kai_psomi · on Oct 23, 2019

> If you could rewarded with an incredible salary for sticking at the same company for 10 years and making a great, reliable platform using "boring technologies" then more people would probably do it.

> Instead to raise your salary you gotta jump jobs every 2 years

I believe the majority of America stays at their employers. It just seems to be the en vogue style for this group. I've made careers at my past two employers. I like to think I've made out very well by sticking to the same company for 4 years, going on 5, much better off than if i joined a startup working slave labor for much less total compensation only to have my shares diluted once an exit happens.

I've done it all at this point in my career, and sticking to a stable, cash-positive business is the best option at this point, IMO of course. Everyone likes to think they'll be the special 1% to make that big exit but then again our generation (millennials) were raised to believe we were special so it makes sense why people go chasing the dollar.

Your employer takes a little more time to adjust, but in the end, people end up where they should be over time.

ldng · on Oct 23, 2019

Out of curiosity, how many times did you get raised in 5ish years ?

tyri_kai_psomi · on Oct 24, 2019

3 times, 2 of them were significant (> 20%)

gpderetta · on Oct 23, 2019

This is a very negative comment. Except I have seen it happen many times.

Franciscouzo · on Oct 23, 2019

It's called resume driven development, and the reason it's done is because it helps those who do it get jobs

shantly · on Oct 23, 2019

As long as companies keep expecting experience in very specific technologies (only at work—400 hours on a hobby codebase don't count for as much as 40 hours writing something at a job for determining competence in a language or framework or whatever, it seems) this will keep happening. It's really, really dumb, but there's little other choice for folks who want to keep their skills (=the names of tools & languages they've used to write something for pay) up to date (=trendy high-paying buzzword compliant).

Doubly true if you're not all-in on one of the major Bigcorp silos (Java, C#).

sagacity · on Oct 23, 2019

At a previous job I worked with someone who didn't see a buzzword they didn't like. Since nobody was stopping them, they designed an incredibly convoluted system using all the latest tech. In the end we replaced it with a JVM server and a Postgres database.

I occasionally check their LinkedIn and they're still spending 12-15 months every time at companies, being CTO or Big Data Strategist or whatever.

aswanson · on Oct 23, 2019

Ambition and incompetence. Nice combo.

mannykannot · on Oct 23, 2019

The Dunning-Kruger career path: persuade the people hiring to think that you are more competent than you are.

unlinked_dll · on Oct 23, 2019

anecdata but I've only had terrible experiences with companies looking for specific technologies (e.g. a job listing of "$LANGUAGE Developer").

Mostly good experiences when it's a specialist gig that happens to require a specific technology. Like some parts of a stack aren't interchangeable and aren't easy to pick up overnight.

camdenreslink · on Oct 23, 2019

Almost nobody is hiring rust devs right now.

chc · on Oct 23, 2019

I wouldn't be surprised if the ratio of competent Rust developers to Rust jobs were worse than 40:1.

aswanson · on Oct 23, 2019

RDD. That is effing hilarious.

jacobush · on Oct 23, 2019

"A much better alternative is to reuse the original library and just publish a safe interface to it."

I think I must reject the premise a bit, unless I misunderstand how Rust works.

If you write a safe wrapper around unsafe code, is it not still the case, that if the unsafe code bombs out, it will take the safe Rust down with it, engulfed in shared flames?

(Unless you spawn unsafe code in its separate process or something elaborate like that.)

(Then again, from a practical standpoint the reasoning may be good. Maybe, even likely, the C library is very good and very battle tested. And very likely, my novice would be re-implementation in Rust would not be very good.)

steveklabnik · on Oct 23, 2019

Yes, this is an important distinction about "unsafe" in Rust. https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html is a good starting place for learning about this.

gbear605 · on Oct 23, 2019

A lot of the time, the danger with unsafe code is doing a thing like passing a null pointer into a function that can’t handle null pointers, or using something before it is instantiated. So the interface should force those things not to happen (with the type system or a constructor or...). This doesn’t prevent the c code from being buggy, but it makes it much easier to use, since you can ensure that you’re maintaining the invariants.

SAI_Peregrinus · on Oct 23, 2019

Also even if you do plan to re-implement the C library in Rust it's still a good idea to do this step (make a safe wrapper around the FFI bindings) first. It lets you set up your Rust test cases, and gives you a step to design what the Rust API will look like. You can proceed to pull bits of the C out and into native Rust, all while whatever you've built on top doesn't have to change.

estebank · on Oct 23, 2019

One but difference of this approach over just using the original library directly is that now you have a type system that can enforce more invariants than it is possible in C and that the library was previously already relying on. If the (now reduced) API surface happens to still expose a bug, it can both be fixed in the library and/or protected against in the API.

jaredtn · on Oct 23, 2019

Summary: - Rather than rewriting a C++ library in Rust (difficult and will introduce bugs), wrap it in a Rust interface.

This seems like the cleanest way to move a codebase forward without too much regression work. It reminds me of how TensorFlow 2.0 still contains tf.compat.v1 APIs so devs can write forward-facing code while maintaining existing modules.

zozbot234 · on Oct 23, 2019

> wrap it in a Rust interface

You'll need to do this anyway if you want an actual library that can be independently packaged, because that means you have to use the system ABI in the library itself. Rust does not provide a stable ABI.

HankB99 · on Oct 23, 2019

I don't think I've ever met a developer who didn't want to rewrite their predecessor's code and that includes the one I see in the mirror. Doing it in a different language would definitely be more interesting.

Do we admit that this is a factor in the decision to rewrite vs. reuse?

FpUser · on Oct 23, 2019

"I don't think I've ever met a developer who didn't want to rewrite their predecessor's code and that includes the one I see in the mirror"

Allow me to introduce myself ;) I mostly work on new products and rewriting to me just a waste of time. I am interested in creating features that give actual ROI. The only time I rewrite a piece of code is if said piece presents a major problem (bugs, performance etc)

HankB99 · on Oct 25, 2019

Pleased to meet you!

mww09 · on Oct 23, 2019

Generating C/C++ bindings in Rust is unfortunately still quite a hassle from my experience: Often C libraries end up with important functions as static inline in header files (these do not get translated automatically by bindgen). If the library uses the macro system to rename functions, these too won't show up as symbols in your library (so you'd have to manually define those names too). Then bindgen often pulls in too many dependencies and I end up manually white/blacklisting what symbols I want/don't want (and track changes through multiple versions). That being said, it's a hard problem and Rust has still one of the better toolings for FFI from what I've seen in other languages.

kragen · on Oct 23, 2019

Where are you getting your CHM files? If you're getting them from untrusted sources — that is, people you wouldn't give your password to — you probably don't want to pass them to a buggy C library. That's just begging to get pwned! The only way this seems like a reasonable idea is if your CHM files are generated by your own software or by something like SafeDocs: https://www.darpa.mil/program/safe-documents

Wrapping your buggy C library in Rust isn't going to give you the kind of security against malicious data that a rewrite in Rust would give you.

lubesGordi · on Oct 23, 2019

Here's another article that is maybe a little more terse about how to write wrappers for C/C++ libs in Rust.

https://medium.com/dwelo-r-d/using-c-libraries-in-rust-13961...

dmm · on Oct 23, 2019

> This may be problematic because it’ll require all users of our chmlib crate (and their users) to have Autotools installed.

Autoconf creates a portable shell script, right? No need to have it installed just to build the project.

beefhash · on Oct 23, 2019

The tree only contains configure.in, not the generated configure output, and Makefile.am, not the generated Makefile.in. You do need the entire autotools suite installed for it to build.

Why they published it like this is beyond me, however. The whole point of autotools is not making your users install them (unlike CMake and virtually every other build system for C there is).

readams · on Oct 23, 2019

That's not true. If you build from a dist tarball, you do not need auto tools installed. You usually do if you build from a git repo, however.

beefhash · on Oct 23, 2019

I think we misunderstood each other?

"It" in my post referred to the Rust cargo package in question. The package ships configure.in, but not the generated configure. You do need autoconf for configure.in -> configure. Similarly, it ships Makefile.am, but not the generated Makefile.in for use with configure.

Note how I said "The whole point of autotools is not making your users install them".

infogulch · on Oct 23, 2019

How/why is a dist tarball not exactly the same as a zipped up git tree?

wongarsu · on Oct 23, 2019

Same as a npm package is usually js code and doesn't require you to run the typescript compiler, a dist tarball shouldn't require you to run autotools but just contain their output.

atq2119 · on Oct 23, 2019

The parent you're replying to answers that question. And if your question is meant rhetorically, it doesn't contribute to the discussion either.

infogulch · on Oct 23, 2019

I'm asking genuinely.

No, parent answers the question, "What is the difference between the git tree and a dist tarball?" Which is self-evidently (and unhelpfully) "you don't need autotools installed with the dist tarball". My questions are: Why does this difference exist? Why would you not track everything necessary to build a library in git? How is the dist tarball built differently than just zipping the current git tree?

UncleEntity · on Oct 23, 2019

"make dist" does stuff other than just make a tarball -- like insert a version number into a header &etc.

steveklabnik · on Oct 23, 2019

"portable" to different Unices, right? I'm on Windows, for example.

dmm · on Oct 23, 2019

Sure but that's a different problem than "it’ll require all users of our chmlib crate (and their users) to have Autotools installed."

I'm just being pedantic. It's a problem...

bluGill · on Oct 23, 2019

Autotools is rarely portable. I've been doing a lot of cross compiling recently: the vast majority of autotools projects can't be cross compiled. Sure autotools itself supports it, but something required to make it work didn't get connected up and so you can't do it.

If you want to natively compile on a fairly recent linux with the common standard libraries autotools works well. Even though autotools was written to work around these differences, in most cases the developer didn't hook into autotools detecting that difference and so it doesn't work.

monocasa · on Oct 23, 2019

Cross compiling to what? Autotools is probably the most portable piece of software out there as far as Unices are concerned.

bluGill · on Oct 24, 2019

Embedded Linux on arm. Nothing very far out really. Note that I'm careful not to blame autotools, it is the user's fault, there are a couple projects that use autotools and cross compile easily. The vast majority do not.

CMake based projects almost always cross compile easily.

StreamBright · on Oct 23, 2019

I still have issues with Rust string handling. Half of the time I have no idea how to get strings out of libraries. I don't understand the intentions of the language creators at all. Does anybody have a great intro to Rust for software engineers who come from Python or something similar where high level types are the norm?

steveklabnik · on Oct 23, 2019

All of Rust's complexity for strings comes out of complexity due to UTF-8. Well, a tiny bit comes from the fact that Rust has pointers too, but most of it is UTF-8.

Did you happen to read the book? We cover strings early because this is a common pain point.

Also, if you have something more detailed than "get strings out of libraries", I can give better advice. It's tough to tell what the actual issue is.

StreamBright · on Oct 23, 2019

Like this:

  use sha2::{Sha256, Digest};

  fn main() {
    let mut hasher = Sha256::new();
    hasher.input(b"hello world");
    let result = hasher.result();
    ??
  }

How do I get a string with the hash value? Which chapter of which book should I read?

steveklabnik · on Oct 23, 2019

Ah, I see! So yeah, this isn't something the book covers, so that would not be helpful.

In this case, the hash ends up being raw bytes. So the question is, what are you trying to do with this hash? Being a bunch of bytes, this may not be valid UTF-8 (and in this case, is not ASCII, let alone UTF-8). One option is to encode to base64:

  use sha2::{Sha256, Digest};
  use base64;
  
  fn main() {
      let mut hasher = Sha256::new();
      hasher.input(b"hello world");
      let result = hasher.result();
      let encoded = base64::encode(&result);
    
      assert_eq!("uU0nuZNNPgilLlLX2n2r+sSE7+N6U4DukIj3rOLvzek=", encoded);
  }

But yeah, you can't exactly "get a string" out of a random bag of bytes unless you know how you want that bag of bytes to be represented, encoding wise. That's not exactly a satisfying answer, but such are strings!

the8472 · on Oct 23, 2019

This seems to be a conceptual mistake that often happens when coming from high level languages, I have seen similar confusion on stack overflow relating to packet based network protocols.

What many low-level data formats deal with are raw bytes ("byte strings" in some contexts). The output of a hasher is random binary data.

What most humans are accustomed to are some form of encoding that makes it easier to spell out the octets. Hex and base64 are common. But they are not the native representation that the machine deals with.

With almost everything being explicit in rust you need an explicit conversion from bytes to hex, base64, urlencode or some other format.

Those conversion methods (with their contract often expressed as a separate trait) may be provided by the same crate or you may have to pull them in from a different crate.

A hasher does not need to provide this itself because its output types can be enhanced by foreign traits.

aisengard · on Oct 23, 2019

The hash value is a bunch of bytes. A "string" would need to be some encoded version of those bytes, commonly hexadecimal representation.

chousuke · on Oct 23, 2019

I think you can use format!() to format the byte array as a hex string.

The result of a hash is never a string, so it makes sense that you'd need another step to print it as one.

Keats · on Oct 23, 2019

As mentioned by others, you need to convert the bytes to a String. `Sha2` provides a helper function for that: `hasher.result_str()`.

EDIT: sorry I don't know which docs I was looking at, I can't find it in the current version.

StreamBright · on Oct 23, 2019

This is helpful but in general what is a problem of having strings and trivial functions to convert something (like bytes) to strings? Maybe I am missing the point.

unlinked_dll · on Oct 23, 2019

Because converting a type to a string isn't trivial. It needs to be defined for that type.

Rust does have a pair of traits that can be used, Display [1] and Debug [2]. Display must be implemented manually by the author of a type, while Debug can be automatically derived if all the members of a type implement Debug. Like this:

    #[derive(Debug)]
    struct Foo {

    }

[1] https://doc.rust-lang.org/std/fmt/trait.Display.html

[2] https://doc.rust-lang.org/std/fmt/trait.Debug.html

steveklabnik · on Oct 23, 2019

The process of "functions to convert bytes to a string" is called "encoding". The problem is, there's no universal way to do it, so you need to choose one. You need to know what the bytes mean, and then convert them into whatever representation that you need.

It is unfortunate that you’ve been downvoted for asking a question.

StreamBright · on Oct 23, 2019

Thank you very much. I am going to read up this because I think I am missing out on crucial details.

It is unfortunate how HN is nowadays but I don’t care if I can still learn a lot from people like you.

steveklabnik · on Oct 23, 2019

Happy to help. Feel free to post questions on users.rust-lang.org as well; people are super happy to elaborate on any question, no matter how big or small.

DougBTX · on Oct 23, 2019

Rust doesn't have varargs, so formatting functions tend to be macros (allowing to pass in multiple arguments, compile time parsing the format string to allow type checking etc) so they are generally not that trivial.

There are some examples of printing output using formatting strings in the RustCrypto readme:

    let hash = Blake2b::digest(b"my message");
    println!("Result: {:x}", hash);

https://github.com/RustCrypto/hashes#usage

If you're looking to get the actual string value rather than just print it out, then take a look at the format macro. It uses the same format strings / traits etc as println, so wherever you see println you could drop in format instead:

    let hash = Blake2b::digest(b"my message");
    let text = format!("{:x}", hash);

https://doc.rust-lang.org/std/macro.format.html

inferiorhuman · on Oct 24, 2019

It sounds like you're coming from a language (like C) where a string is just an array of bytes or a language (like Ruby or Javascript) where strings are seriously overloaded.

In Rust a string is a UTF-8 sequence. Typically human readable. A byte is generally represented by the u8 type, and a collection of bytes is either an array or vector (e.g. [u8] or Vec<u8>). To create a human readable form of an object in Rust you'd typically use the Display ({} format specifier) and/or Debug traits ({:?}). A string (e.g. &str or String) is NOT a collection of bytes. There are exceptions however with OsString/OsStr representing something closer to a collection of bytes and CString/CStr representing a bag of bytes.

Your comments seem a bit XY-ish to me. What are you trying to solve by converting things to strings? For debugging or human readable output Array, Vec, and u8 all implement the Debug trait. u8 also implements UpperHex and LowerHex so you can get a hex formatted version as well (e.g. format!("{:02X}", bytes)).

For the (MD5) hash case, as others have pointed out you're looking at a base 64 encoding which you'll often have to do on your own depending on the library you're using.

Now if you're struggling with owned vs borrowed strings that's a whole other matter. You can insert some magic into your functions with generics and trait constraints and into your structures with the Cow type (clone on write).

unionpivo · on Oct 23, 2019

since we know that Sha256::new() will always produce 'valid' characters for utf8 you can just do

    let my_rust_string = str::from_utf8(result).unwarp()

If there is a possibility of an error, (you read it from file, network ...) you can do:

    let my_rust_string = match str::from_utf8(result) {
        Ok(str) => str,
        Err(err) => panic!("Invalid utf8: {}", err),
    };

Or you can do lossy conversion (will have that question mark on chars that it cant decode)

    let my_rust_string = String::from_utf8_lossy(result);

edit: Ok so i read sha2 api wrong. result() is raw bytes and result_str() is hex encoded string. Above wont be much use for either. And others have already explained what to use.

I just stared programming in rust several months ago, and that was my pain also, since where i live we still have some other encoding in use.

McWobbleston · on Oct 23, 2019

I'm well versed in Rust but I think you might need to use

str::from_utf8(result.as_slice())

https://docs.rs/generic-array/0.8.2/generic_array/struct.Gen...

https://doc.rust-lang.org/std/str/fn.from_utf8.html

kam · on Oct 23, 2019

That would be correct if the bytes were a string encoded in utf-8, but that's probably not the case for the output of a hash function.

fjp · on Oct 23, 2019

Which book? I'm just starting the O'Reilly book, coming from Python

steveklabnik · on Oct 23, 2019

"the book" colloquially means https://doc.rust-lang.org/stable/book/ (I'm one of the authors)

The O'Reilly book is good too.

baq · on Oct 23, 2019

It’s like that because strings are very difficult to get 100% right even in gced languages (Python 2 says hello) and when you take into account ownership and different standards used even in the same OS it all becomes a nightmare. You have to know when you want a reference or an owned object, when you want mutable vs immutable and when you want portable vs OS specific, and god forbid you put a path in a string. This is why rust has multiple string type families, all of which (well, most) Python very conveniently hides, for good reasons.

StreamBright · on Oct 23, 2019

Thanks, it would be great to cover all these types with common scenarios (like using a hash function, storing something in a database or things that are the most common in programming). If it is a good idea to hide those things then we need a library that does exactly that for Rust. Don't you think?

baq · on Oct 23, 2019

They’re good reasons for Python but not necessarily for Rust, where you are expected to care about such details by design. It’s one of the language’s differentiators.

Nonetheless a cookbook for converting strings for different use cases isn’t a bad idea at all.

jamil7 · on Oct 23, 2019

Archived link, I think HN knocked over the server:

https://web.archive.org/web/20191023134247/http://adventures...

fmakunbound · on Oct 23, 2019

> Rust gives you the power to do things which would be unheard of (or just plain dangerous) in other languages

What are examples of the unheard of things?

mmaurizi · on Oct 23, 2019

Passing around pointers to objects on the stack.

In another language you'd risk having a pointer to invalid memory when the function returns if the pointer escapes the function, and so it is advised to use pointers to objects on the stack sparingly if at all.

In rust, the compiler will statically determine if a pointer to an object on the stack could escape the function, and if so will fail to compile.

madmax96 · on Oct 23, 2019

>so it is advised to use pointers to objects on the stack sparingly

At least in the C/C++ world, this is not true. Where possible, we prefer stack allocated objects. Stack allocation is fast and usually gives you better cache performance than heap allocation.

Rust's lifetime/borrow checker provides compiler support for well-established best practices amongst professional C and C++ users.

bernawil · on Oct 23, 2019

looks like you are arguing what "sparingly" means in that context because the rest of the sentence is totally valid. Stack and Heap allocation in C serve different purposes and there's really no rule of thumb to choose one and there's not even one case more frequent than the other. Allocate objects in the stack, it's faster. Unless you need a reference in a function in another context. Or it's too big. Or you're going going to share it between threads. Or whatever.

IshKebab · on Oct 23, 2019

There's pretty easy rules: stack unless it needs to outlive the current scope, or is very large. That covers 99% of cases.

bernawil · on Oct 23, 2019

But you realize that "it needs to outlive the current scope" is vague enough to support the statement that there's really no rule of thumb, right?

randyrand · on Oct 24, 2019

Seems pretty objective to me. When the current function returns, will this object ever get used again? Most of the time the answer is no.

zeroxfe · on Oct 23, 2019

> In another language you'd risk having a pointer to invalid memory when the function returns if the pointer escapes the function, and so it is advised to use pointers to objects on the stack sparingly if at all.

Just a nitpick, this is only true for non-garbage-collected (or refcounted) languages. :-)

lmkg · on Oct 23, 2019

Well, if you're gonna nitpick...

A garbage-collected language wouldn't let you allocate such items on the stack in the first place.

GC languages tend not to let you choose where to allocate in the first place, and have obligatory heap semantics for reference types. Some compilers (including Go and Java, to my knowledge) will attempt to optimize the implementation to stack allocation when possible using escape analysis, but this is less precise, and more opaque, then Rust's allocation. And you won't get feedback if it stops working.

So, by GP's comment of "unheard of (or just plain dangerous)", the "unheard-of" applies to GC languages and "plain dangerous" would apply non-GC languages.

pjmlp · on Oct 23, 2019

Here goes a list from GC enabled languages that let you chose where to allocate.

Modula-3, Oberon, Oberon-2, Active Oberon, Mesa/Cedar, Component Pascal, Eiffel, Sing#, System C#, C#, Swift, D, Nim

ncmncm · on Oct 23, 2019

I prefer the term "obligate-GC languages": you don't get to choose whether GC runs. Otherwise, Rust and C++ count as "GC-enabled", and comparisons are vacuous.

Reference-counting, a form of GC very commonly used in Rust and C++, is inefficient compared to more intrusive schemes, particularly where there may be contention for the count, but may be applied selectively, e.g. never on critical paths, so that the inefficiency has zero impact on overall system performance.

Obligate-GC advocates like to point at custom benchmarks showing overhead at a small percentage. They are invariably lying, by reporting only the time that the profiler says the program counter is pointing to GC code, and hiding the much-larger results of loss of cache locality on the system as a whole. Typically they don't even know they are lying, which should not give one much confidence in their engineering judgment.

pjmlp · on Oct 23, 2019

You mean custom benchmarks like Midori powering Bing for Asian requests, the ixy paper, Android GPGPU debugger, Fuchsia TCP/IP stack, ChromeOS and Google Cloud sandboxes.

Yep, really lying.

nickpsecurity · on Oct 23, 2019

The pre-Cortana version of the speech service ran on Midori, too, with safer code:

https://www.reddit.com/r/programming/comments/3t50xg/more_in...

They admit it wasn't quite full-GC stuff. It was close to the C++. It did use the language safety and parallelism to its advantage on top of the Midori OS. They ended up improving performance over the original.

Still worth mentioning even if not fully-GC. I mean, they could've always used a real-time GC if that was important. There's already commercial and academic ones. For some reason, these projects never try to do that. I think even those developing GC'd systems might not know about RT designs.

zeroxfe · on Oct 23, 2019

> Well, if you're gonna nitpick...

:-)

> A garbage-collected language wouldn't let you allocate such items on the stack in the first place.

I get your point, but I don't think this is entirely true.

When people think of call stacks in the general sense, there's a major distinction between CPU stacks and managed stacks.

For example in both Java and Go, stack allocations are determined at compile time, and heap allocations at runtime. (Obv, stacks here are not actual CPU stacks, just managed memory stacks depending on the implementation.)

dnet · on Oct 23, 2019

You omitted an important part: the pointer pointing to the stack. In GC'd or refcounted languages, (almost) everything is on the heap (occasional exceptions being primitive types like integers). This of course leads to worse performance because of an additional dereferencing step and cache misses.

fluffything · on Oct 23, 2019

Which garbage-collected languages put automatic variables in an actual stack that gets bumped on function calls even when you pass pointers to the automatic variables ?

Gaelan · on Oct 23, 2019

In garbage collected languages, are there any objects on the stack? I’m not familiar with GC implementations, but at least theoretically everything is allocated on the heap. Maybe some implementations keep things on the stack as an optimization?

pjmlp · on Oct 23, 2019

Yes, see my reply to other thread.

https://news.ycombinator.com/item?id=21335506

jhomedall · on Oct 23, 2019

Escape analysis is a common optimization for GCed languages.

e: Also, many languages allow the user create user-defined value types, which are copied rather than referenced. Variables containing these can be stack-allocated.

nicoburns · on Oct 23, 2019

Unheard of might be overstating it, but in Rust you can do things like pass references to other threads and aggressively parallelise things (without lots of defensive copies) without worrying about it causing bugginess later.

oconnor663 · on Oct 23, 2019

Writing a multithreaded CSS engine and shipping it in Firefox :)

ivoras · on Oct 24, 2019

> Rust gives you the power to do things which would be unheard of (or just plain dangerous) in other languages

Ummm.... any pointers as to what the author is talking about?

jchw · on Oct 23, 2019

> why would you be better equipped to write a library for some domain-specific purpose than the original author?

This is how you avoid growing as a programmer.

xwolfi · on Oct 23, 2019

But that's how you grow as a software shipper.

Our job is not to program crap again and again, it's to deliver some value for the high fees we charge. Rewritting a working library to migrate language is really hard to defend...

jchw · on Oct 23, 2019

No it’s not. You didn’t write it the first time, and grow from writing it the first time. Writing and shipping something in a new programming language is a fantastic way to learn both. This has only become contentious as people try to come up for a reason why rewriting things in Rust is bad, when really it is about as benign as trends go. I never heard the same complaints about rewriting things in Go (another language I love, btw) and even JS had less naysayers somehow.

You may question the usefulness of having “x but in Rust” and that is fair. However, people keep answering the reasons why you might do something like this and yet like amnesia the same bad opinions come out again next time.

— Sincerely, someone who also ships.

empthought · on Oct 23, 2019

Rewriting software written using Python, Java, and Ruby in Go is entirely different from rewriting software written using C and C++ in Rust. Go has easily quantifiable advantages over those other runtimes regardless of the original code's correctness.

jchw · on Oct 23, 2019

People rewrite C and C++ software in Go too. In fact, unlike Rust, most Go software has no C dependencies at all. This is precisely the result of rewriting everything in Go.

To be clear I’m not suggesting you should always rewrite something versus just wrap it, not at all. I’m just saying the reasoning for not doing it being “you didn’t write the original so what do you know” is very offensive to me; how are you supposed to learn?! People write NES emulators all the time even though they already exist. While most are purely for practice it doesn’t really matter: they wanted to write it, they wrote it, they learned. Discouraging people from writing something that they want to write is upsetting to me.

empthought · on Oct 23, 2019

Most of the common C dependencies were implemented in Go before its public release - https://golang.org/doc/go1. These implementors are exactly the "better-equipped" original authors to which the blog post was referring.

Most programmers are programmers for hire, and it would be a waste of everyone's time to re-implement (for example) libdispatch in Rust. If someone wants to write it on their own time for their own edification and no other purpose, fine -- but half-assed internal implementations of common functionality are a significant drag on the lives of other working programmers.

jchw · on Oct 23, 2019

All of this has absolutely, truly, nothing to do with rewriting something in Rust. You could also rewrite the same library again in C and have the exact same conundrum.

Also, people build dependencies on effectively “hobby” projects all the time. Open source in particular works well this way because the dependents have a good reason to contribute back; the users become stakeholders and contributors. The same may not be true of internal software, but again, this has little to do with the concept of rewriting something in Rust.

This is really diverging into a discussion that has nothing to do with the original paragraph I addressed.

tuesdayrain · on Oct 23, 2019

People typically write NES emulators in their spare time, not on a job where others will be expected to maintain their science project.

jchw · on Oct 23, 2019

Nothing in the article suggests it is addressing only work “on the job.”