Rust is a great choice for systems programming it'd seem. Esp considering that a well-tested, battle-hardened code-base like SQLite faces problems [1] solely due to the nature of the language its written in.
Not sure why you're being downvoted. I don't see why we shouldn't be moving to languages like Rust given the chance, as C makes it far more difficult to write safe and correct code.
I expect it's because because the first part is essentially preaching to the choir, and because Richard Hipp very much disagrees with the second part[0]
> Rewriting SQLite in Rust, or some other trendy “safe” language, would not help. In fact it might hurt.
(see link for expansion on that matter, which is a question of tooling and testing)
Disappointing to see Hipp make that argument. It's trivially refuted.
Yes, all programming languages allow the programmer to write bugs. But languages very much vary in how many, and what kinds of bugs programmers write in practice. Saying "well, Rust doesn't eliminate all bugs" is attacking a straw man. If you want to argue that Rust isn't worth it, you need to convince me that C plus gcov results in fewer bugs in the important areas in practice than Rust (plus kcov [1] if you like) does. I think that's going to be pretty hard. (Especially if memory safety issues are the most important bug class you're concerned about: I think it's completely impossible for any C-based solution to compete with Rust here, regardless of how much tooling you add.)
Drawing an equivalence between undefined behavior and compiler bugs also doesn't make sense. Compiler bugs are way way less commonly encountered than undefined behavior in C. Also, they're qualitatively different: compiler bugs get fixed in new compiler versions, while UB is by design and doesn't get fixed.
> If you want to argue that Rust isn't worth it, you need to convince me that C plus gcov results in fewer bugs in the important areas in practice than Rust (plus kcov [1] if you like) does.
I don't have a dog in this fight, but I don't see how the burden of proof is on Hipp rather than the folks proposing the change. In other words, shouldn't the "rewrite it in Rust" folks have to prove that the cost of their proposed rewrite will be justified?
> In other words, shouldn't the "rewrite it in Rust" folks have to prove that the cost of their proposed rewrite will be justified?
Agreed, they do. But that it hasn't been proven to make things better doesn't mean it will make things worse. It just means that we don't know enough to say. The right way to answer the "would this software have fewer bugs if it were rewritten in Rust?" question requires a detailed look at what bugs the software has empirically encountered.
A rewrite automatically makes things worse because you start with no code.
I figure it's one of the signs of programmer maturity, that you start to look askance at rewrites. So tempting, yet so rarely even finished let alone better.
In the case of Rust I don't believe this is 100% true given C ABI compatibility. You could start rewriting in such a way that it is integrated with the existing code and slowly, but surely tease the C out of the system.
It would for the longest time be a C program with a metastasizing wart of Rust hung off the side, impossible to get into, impossible to work with, debugging hell, compilation hell. The distros would weep.
Sure, the primary burden of proof is on those proposing a change. However, any time you stand up and make an argument, the burden is on you to make sure it actually makes sense, and that goes for both sides.
This is getting a bit meta, but I disagree. It would be trivial to abuse in discussions.
A: Bash would be way better for SQLite, really!
B: But Bash is a terrible choice because X, Y, Z, ...
A: If you make those arguments, you have to prove them.
I'm not sure I take your point. If X, Y, and Z are cogent enough to be worth a rational response, B has done their job under my principle above. A is precisely the one I would want to yell at for using specious arguments.
Anything that is stated without proof can be refuted without proof. Why you think your opinion should be regarded at a higher standard than anybody else's?
...I don't. That's sort of my point. Both sides of a debate have the same responsibility to be reasonable, whatever that level of responsibility may be, whether it's a formal debate or just tossing ideas around.
I like to swing it the other way, you have to prove me that the newly proposed solution is not worth it. You can check out Amazon's policy about driving change. The "will not work" camp has to do the work to prove that something is not going to work. I think this approach yields to better results in practice.
'Saying "well, Rust doesn't eliminate all bugs" is attacking a straw man.'
This is itself a straw man.
Follow the link to Mr. Hipp's comments and read them. He did not say this.
That a programmer who has produced such high-quality and rigorously tested software as sqlite should be portrayed as either cavalier or naive about software quality is something I find profoundly mis-guided.
> This is itself a straw man. Follow the link to Mr. Hipp's comments and read them. He did not say this.
"Rust doesn't eliminate all bugs" is a rephrased version of "Some well-formed rust programs will generate machine code that behaves differently from what the programmer expected."
> That a programmer who has produced such high-quality and rigorously tested software as sqlite should be portrayed as either cavalier or naive about software quality is something I find profoundly mis-guided.
I don't think he's cavalier or naive about software quality! SQLite's quality speaks for itself, and he is completely correct about what you need to do to achieve the level of correctness that SQLite has. He knows a lot more than I do about the rigor needed to ensure that amount of software quality. I do think, though:
1. The amount of testing that SQLite undergoes is not economically feasible for most software.
2. Rust's testing tools would allow for the same level of code quality in the absence of evidence otherwise. (The example he cited, code coverage, is wrong, as kcov is available for Rust.)
3. There is value in static analysis above and beyond the value of testing (also vice versa), because testing only reveals bugs that manifest themselves in inputs available in the test suite. This is true regardless of the code coverage of the tests.
In other words, I think Hipp's argument would make sense if it were something along the lines of: "I couldn't test a Rust SQLite as well as I can test the existing SQLite because kcov is missing features X, Y, and Z, and the value I get from the static analysis is less than the value I get from those code coverage tools, since we've found lots of bugs from those code coverage features and haven't found as many memory safety/data race bugs". That'd be totally valid. But as a blanket statement that Rust would make things worse, it doesn't make sense to me.
I was not aware of kcov or its basis bcov. Thanks for pointing those out.
However, a quick glance at the bcov source code leads me to believe that it only does source-line coverage, not branch coverage. So, unless my quick reading of bcov sources is mistaken, I couldn't test a Rust SQLite as well as I can test the existing SQLite because kcov/bcov is missing the ability to measure coverage of individual machine-code branch instructions, and the value I get from static analysis is much less than the value I get from branch-coverage testing tools.
That's not being pedantic, btw. The difference between source-line coverage and machine-code branch coverage is huge. The latter really is necessary.
Thanks, that's really great feedback. We should invest in making use of llvm-cov. It should dovetail nicely with the MIR work (which is now to the point where it can build the Rust compiler), since the language-specific IR it exposes is especially suited for quick development of instrumentation passes.
I don't think he's making the argument you think he's making. This is clearer in his later comments in the thread, in which he says that UB is much scarier in systems like Fossil.
His argument is that achieving the level of quality that SQLite has requires verification (in a broad sense) after the compiler, and that's what he does. If you consider the goal to be producing quality-assured binaries, then you can treat UB, compiler bugs, and many other things as falling in a similar category, which are almost certainly eliminated by an MC/DC test suite.
As you say, this isn't feasible for almost any software (as John said in the blog post, SQLite is the only program he knows of that has MC/DC testing when not required by law). But it does mean that rewriting SQLite in Rust wouldn't provide as much value as rewriting many other things where the binaries do not have such guarantees.
> His argument is that achieving the level of quality that SQLite has requires verification (in a broad sense) after the compiler, and that's what he does.
> If you consider the goal to be producing quality-assured binaries, then you can treat UB, compiler bugs, and many other things as falling in a similar category, which are almost certainly eliminated by an MC/DC test suite.
I don't think that they're eliminated because dynamic testing can't eliminate everything—it only finds bugs given its test inputs.
Moreover, though, I'm also skeptical of the claim that binaries are all that matter. Lots of software projects (for example, Firefox) import SQLite as source into the project instead of using the binaries. They upgrade their compilers without running the SQLite test suite to catch regressions. (Firefox might, but I'm sure lots of other projects using SQLite from source don't.)
> But it does mean that rewriting SQLite in Rust wouldn't provide as much value as rewriting many other things where the binaries do not have such guarantees.
Sure; static analysis is more useful in systems that aren't as well dynamically tested. But static analysis still has value.
> I don't think that they're eliminated because dynamic testing can't eliminate everything—it only finds bugs given its test inputs.
I can't think of an example of UB-exploit in the compiler that wouldn't result in different branch behavior, and thus, I think, in failure of MC/DC. But I may simply be insufficiently imaginative.
John makes the point about source code in a follow-up comment, and I agree, but I also see Hipps' perspective.
> I can't think of an example of UB-exploit in the compiler that wouldn't result in different branch behavior,
What about Implementation-defined Behavior? This is (I think!) technically a subset of Undefined Behavior and it permits such things as setting values[1] to arbitrary (but well-defined!) values on various operations, such as "excessive" left shifts. What I'm saying is that a compiler is permitted to substitute IB for UB and still be conforming. So it could start to do strange things to arithmetic, etc. Does that make sense as an example of what you're thinking of?
EDIT: [1] I obviously meant memory locations... as referred to by "variables" which aren't really variables, but are really binders/aliases. But here we are.
I think you're confused about implementation-defined and undefined behavior. The former is not a subset of the latter, but excessive left shifts are UB.
> "Rust doesn't eliminate all bugs" is a rephrased version of "Some well-formed rust programs will generate machine code that behaves differently from what the programmer expected."
I think that sounded more like "Many of the programs which are UB in C are also UB in Rust, even though not (yet) specified as such."
> I think that sounded more like "Many of the programs which are UB in C are also UB in Rust, even though not (yet) specified as such."
Well, this seems either (a) false or (b) uninteresting to me. It's false because C and Rust have different semantics, and Rust rules out lots of programs that C doesn't. It's uninteresting because if the point is that Rust has accidental undefined behavior due to compiler bugs (and it does), then that UB is so rarely hit that it doesn't matter nearly as much in practice as the UB in C.
CPUs have bugs, too. Do we consider Java an unsafe language because of rowhammer?
Help me out here, I don't know much about Rust. Which bad programs does Rust rule out?
I know about the borrow checker, but I don't think ownership bugs is a type of bug that Mr. Hipp frequently produces. It's a program design issue -- not something you think about at every single line you write. A well-designed program does not do many ownership transfers.
As someone else noted, out-of-bounds errors are sadly a pain to debug in C. For testing code, C has at least valgrind (and certainly many less well-known tools). For production code, you might not want dynamic index checking. It would invalidate performance arguments for Rust.
I think the real nasty cases of UB in C, which frequently occur as not statically refutable, are signed arithmetic overflow and shifts. As with out-of-bounds errors, I don't think Rust has a better story here -- only better built-in tooling.
> I know about the borrow checker, but I don't think ownership bugs is a type of bug that Mr. Hipp frequently produces.
The borrow checker eliminates use-after-free. And use-after-free is one of the most common types of vulnerability exploited in practice today, if not the single most common. (For evidence, look at reports about Pwn2Own.)
Index checking is quite cheap for most programs, and LLVM is good at eliminating redundant checks. As a useful comparison in another languages, Chromium compiles with bounds checks on std::vector by default (the [] operator, not the at() method which is required to be checked.) Actually, there are a lot of issues with Rust's current machine code output that I think slow it down, but index checking isn't one of them.
I don't think you can count use-after-free as exploitable. Sure, if you have them (and I think those cases can largely be ruled out by good design), it leads to crashes. But for memory type exploits you need control over the value in it. Not a security specialist but I'm not aware of common attacks besides overflowing buffers.
In hot paths (like codecs, compression algorithms...) I'm sure you never want bounds checking. It can be optimized away for sequential loops of course, but not so easily in the case of data lookups.
"While it is technically feasible for the freed memory to be re-allocated and for an attacker to use this reallocation to launch a buffer overflow attack, we are unaware of any exploits based on this type of attack."
I have no idea what that article is talking about. Use-after-free exploits have been extremely common for years. Like I said, look at Pwn2Own (which requires submitting an actual working exploit): https://www.google.com/search?q=pwn2own+use-after-free
Gah, it seems this one isn't good. I trust OWASP and skimmed, it seems that I agree with sibling commentors that this is more dangerous than is presented here.
> If the requested behaviour is unspecified (just `a+b`), overflows cause runtime panic.
As far as I know most machines can't trap on signed integer overflow. Which means having runtime panic is not practical. The page you linked says "no checking by default for optimized builds".
Ok, I simplified this too much. For full details of what happens you'll have to read the docs.
The non-release mode panics. Release mode with debug asserts turned on panics. Release mode with forced overflow checks panics. Release mode with no special options results in the same result as a wrapping operation.
1. The amount of testing that SQLite undergoes is not economically feasible for most software.
But the argument isn't about "most software" - it's about SQLite in specific. And that's the crux of the issue. Saying that he believes it would be counterproductive to rewrite SQLite in Rust at this time is not saying that that would be true of all or most or some other programs.
> Saying "well, Rust doesn't eliminate all bugs" is attacking a straw man.
Yes, it's very much akin to saying "well, a seat belt won't eliminate all deaths". I would understand an argument of "I'm not prepared to rewrite the software at this time" or "I'm not familiar enough with replacement X to assess whether it's a good choice", but at some point more people will have to acknowledge C's failings with more than lip-service. That doesn't mean Rust has to be the solution/replacement, but something does.
Nor does Rust let you trigger "dangerous" UB in non-unsafe code. This isn't going to change, ever, so the argument that "give it time, Rust will soon be chock-full of UB" is moot too.
Yeah, it's one of Rust's strongest features, as it allows progressive migration of native code, and also allows Rust to serve as a fast/low-overhead extension language for Ruby and Python (etc.).
The quality of tooling, and the ability of experts to verify the output of machine code is a really important point. I think I'd agree, rust at this point would hold back an elite developer like Richard Hipp.
The promise of rust, which may or may not be realized is pushing some very common problems down to the compiler. All code has bugs, so the compiler probably does things wrong in some cases. As the tools mature, these will get scrubbed out, just like every other project.
That said, array bounds checking is responsible for so many problems, it seems worth it to raise the minimum for a language. Not every developer is elite. In fact, they're pretty rare. C requires you to be really smart all the time, or at least be aware of when you're not smart enough to get a chunk of code right. Looping over some bytes from a file shouldn't be that risky. rust lets me, a less than elite developer, save my few moments of brilliance for the hard part of a program, rather than having to worry about the evaluation order of foo(i++,i++);
Maybe, it'll turn out the only way to make good software in the future is to find the best 100 developers in the world, and get them to make stuff. I doubt it though, being able to leverage the other million of us to make stuff, and have some (a lot?) of confidence it'll be free from the most common C errors is valuable.
Nobody should be forced to use tools they don't like. rust is trendy, but it has some very good ideas. Trendyness isn't reason alone to dismiss its approach. blah blah rust cheerleading blah blah.
I think the idea that "elite" developers can write bug-free C (or even just network-facing C free of security-sensitive memory safety problems) is pretty well refuted at this point. Now you can write bug-free C if you're willing to spend enormous time and money on testing: this is to a first approximation what SQLite did. But that only makes economic sense for a small minority of projects. Just putting "elite" developers on a C project, without the huge verification cost, is by itself not enough to eliminate bugs, as much as we as hackers would like to think it is.
(The one exception may be, like, DJB. But djbdns/qmail are very unusual C programs in many ways.)
It will be interesting to see if the formal proof/verification work that is being done by NICTA for the seL4 project will mature into something that can be used practically elsewhere in industry.
From my understanding, NICTA's proofs are basically algorithms to show that Haskell models and C code is equivalent. They can then go on to do proofs with their Haskell models.
> C requires you to be really smart all the time, or at least be aware of when you're not smart enough to get a chunk of code right.
It's not so much that it requires smarts. It's that is requires you to be ever-vigilant and to never make any mistakes. (That's why UB and bounds overflows are so devastating to software security. Almost any slip-up by the developers can be exploited.)
Incidentally, the ever-vigilant bit is also why we really want compilers to be doing the bounds-checking (or proving that it isn't necessary). Compilers are really good at being ever-vigilant. Humans (no matter how smart)... not so much.
> That said, array bounds checking is responsible for so many problems, it seems worth it to raise the minimum for a language.
Amen to that! If I could just add one feature to C - at least as an option - it would be array bounds checks.
I have no trouble with manual memory management - a garbage collector is nice to have, but I have not had many problems with memory leaks or dangling pointers. And the ones I had were relatively easy to locate and fix.
But array bounds violations are so easy to commit and so nasty to track down... When I still wrote C code for a living, I would have gladly sacrificed quite a bit of performance to get bounds checks on all array accesses, at least for testing and debugging...
Well. I'll gladly admit that the C code I have written was way less complex than a web browser. When I still wrote C code for a living, I worked on an application suite that was ~250k lines of code, maybe ~300k, in total, whereas e.g. Firefox is a couple of million lines. I assume other mainstream browsers are similarly big.
Plus, what a browser does is, by nature, a lot more complicated than what I worked on.
EDIT: What I am trying to say is: I do not mean to trivialize use-after-free bugs, but in my personal experience, I did not run into many, while I witnessed (and caused, I am afraid) a bunch of array bounds violations, and they sometimes made me want to cry.
> You don't want to be writing your interpreters in C.
Well, technically, I think most browsers these days are written in C++, but your point remains valid.
OTOH, the "default" Python interpreter is written in C, and so is Perl (Ruby MRI, too, I think, but I am not 100% certain). I cannot recall any major security problem with those languages that originated inside the interpreter (which, of course, does not mean those did not/do not exist). Then again, a web browser is probably far messier in terms of what input is has to deal with.
> Then again, a web browser is probably far messier in terms of what input is has to deal with.
That's the ticket, the browser does have a large attack surface but more importantly it's supposed to safely execute completely arbitrary and untrusted payloads. In the same category are pretty much all of the usual suspect of security issues: flash, java (applets), …
Most interpreters are only fed trusted payloads, lest the developer starts eval'ing stuff they got from god knows where, and in that case the fault is usually laid to the developer's feet rather than the interpreter's.
Yeah. The browser is more like a hypervizor that Amazon might be running to run arbitrary people's VMs. But it has an unimaginably larger surface area than Xen.
Prof. Regehr did not find problems with SQLite. He found constructs in the SQLite source code which under a strict reading of the C standards have “undefined behaviour”, which means that the compiler can generate whatever machine code it wants without it being called a compiler bug. That’s an important finding. But as it happens, no modern compilers that we know of actually interpret any of the SQLite source code in an unexpected or harmful way.
At some point some popular compiler is going to make a subtle but important change to some undefined behavior that's not going to be immediately obvious as to it's repercussions, and the fallout will be massive. It boggles my mind the mental contortions people will go though to justify what is essentially an argument of "it hasn't caused a problem yet" while ignoring that it's caused many problems already, just not that they've noticed or that have affected them.
At some point some popular compiler is going to make a subtle but important change to some undefined behavior that's not going to be immediately obvious as to it's repercussions, and the fallout will be massive.
Wait until you realize that the length of a byte in C is not clearly defined. Someday a processor will come along where a byte is 6 bits, and the fallout will be massive. (Really, it's happened before).
No, actually that processor would not become popular because on one would use it.
The reality is unless you are using a formally-defined language like ML, you are relying on undefined behavior in your language.
The number of bits in a byte is specified by the CHAR_BIT macro, which is required to be at least 8. It's commonly larger than 8 on C compilers for some DSPs, but almost universally 8 for hosted implementations.
Did you see the short-lived attempt to create "friendly C" a few months ago? [1]
There's languages that have a few undefined corners, and then there's C. A sufficiently large difference in quantity becomes a difference in quality. C is qualitatively worse than most modern languages with its undefined behavior. (Granted, some modern languages escape by having the one implementation, which is then the definition. But still, that's less undefined than C.)
It was too ambitious. Some of those examples, like shifting by 32, are about trying to remove even unspecified behavior. If you could reduce most cases of undefined behavior down to "what some real machine would do", it would be an enormous help, and wouldn't be anywhere near as hard.
For example you could say that division by zero will either give a result or trap, but that it can't do anything else, and the code path cannot be ignored. Or that an uninitialized variable is equivalent to initializing it with a semi-random number.
Even if an out of bounds array access will cause untold chaos, you can at least specify that it will cause that chaos at X bytes past the base of the array.
Merely creating an invalid pointer would be, on architectures where it doesn't trap, 100% harmless.
And for crying out loud, is forgetting to terminate a string literal with a " still undefined? There are so many bits of undefined behavior that are easy to remove.
Because uninitialized variables are allowed to change values arbitrarily over their (nonexistent) "live range". Your proposal would force them into having a real live range, with a stable value. See this section for an example, which shows examples of the kinds of optimizations this opens up:
There's also the issue that having a stable value forces the register allocator to keep a live range for the undefined variable, which can cause unnecessary spills or remat in other places. Especially on 32-bit x86, this can be a problem.
> Because uninitialized variables are allowed to change values arbitrarily over their (nonexistent) "live range".
Even specifying that they have a new arbitrary value on each access would be a big improvement over the status quo. It wouldn't allow nasal demons. Since LLVM seems to already have these semantics, it makes a good argument that it wouldn't hurt C's performance to tighten the spec at least that much.
But I'm not seeing how C or C++ code gets you in a situation where you're purposefully doing arithmetic or bitwise operations on uninitialized variables. If it almost never happens, it doesn't need to optimize particularly well. What parts of the STL can be faster by treating uninitialized variables as impossible?
> There's also the issue that having a stable value forces the register allocator to keep a live range for the undefined variable, which can cause unnecessary spills or remat in other places.
This would only happen when the variable is accessed multiple times. In which case having to keep the value safe is no worse than if it actually had been initialized. I don't see how this is a problem.
> What parts of the STL can be faster by treating uninitialized variables as impossible?
What I'm mostly thinking of is allowing unused branches to be pruned. The STL tends to get inlined really heavily, which results in a whole pile of IR being emitted for what look like very simple operations. Based on the actual parameters and state, the optimizer then wants to prune out as much dead code as possible to reduce i-cache footprint, and sometimes time as well.
Take small string optimization. That optimization requires a branch on length in almost every string operation. But if you have a std::string of constant size, you don't need the heap spilled code to be emitted at all. Usually the only way to work this out is inlining + constprop + DCE. That's where undefined value semantics are really helpful: a branch on an undefined value can be completely removed as undefined behavior, which can make its branch targets unreachable, and allow them to be removed, and so on recursively. Undefined values allow entire CFG subtrees to be eliminated in one fell swoop, which is an extremely powerful technique for reducing code size.
I don't have a precise example off the top of my head as to where this kicks in in the STL, but I strongly suspect it does.
I still don't understand. Why would length be undefined if that's how you tell whether a string is small or not?
Even if you can remove one of the branches because you know if the string is small, the logic of "this branch can't happen" -> undefined -> delete sounds more complex than "this branch can't happen" -> delete.
The order is undefined -> "I, the compiler, declare this branch can't happen" -> delete. The middle step is valid because "undefined behavior" permits the compiler to make that declaration, then act on it. If you don't want it to do that, use defined behaviors only, which is a great deal easier said than done. Partially because of how hard it is to avoid it in your own code, and partially because it is shot through all the other library code (which as pcwalton points out, courtesy of aggressive inlining, is also your code).
I was skeptical about all this about six months ago myself, but the continuous stream of articles on this topic, plus the spectacular and highly educational failure of Friendly C (and not just that it failed, but why it failed, which is why I posted that exact link) has satisfied me. It is also part of why I've stepped up my own anti-C rhetoric since I've been so convinced... as bad as I thought C was, it really is, no sarcasm, not merely "hating", even worse than I thought. I am honestly scared to use it professionally and pretty much refuse to touch it without good static analysis support.
I just don't understand how "std::string of constant size" leads to the compiler inferring that a variable is undefined along a particular code path. In particular, on the not-taken code path, what is the uninitialized variable being branched on?
Edit: Is it figuring out that the pointer in the string object is uninitialized? That doesn't seem any easier than reasoning about the length value, and I don't see how it would lead to "br undef".
LLVM has been doing this optimization since 2008, and it contains justification in the commit message. LLVM commit #60470:
Teach jump threading some more simple tricks:
1) have it fold "br undef", which does occur with
surprising frequency as jump threading iterates.
...
Chris didn't cite any specific numbers in the log, but I believe him when he says it actually happens. You should be able to run "opt -O2 -debug-only=jump-threading" and see where it does :)
Related, with some actual numbers to prove it helps, LLVM commit #138618:
SimplifyCFG: If we have a PHI node that can evaluate to NULL and do a load or
store to the address returned by the PHI node then we can consider this
incoming value as dead and remove the edge pointing there, unless there are
instructions that can affect control flow executed in between.
In theory this could be extended to other instructions, eg. division by zero,
but it's likely that it will "miscompile" some code because people depend on
div by zero not trapping. NULL pointer dereference usually leads to a crash so
we should be on the safe side.
This shrinks the size of a Release clang by 16k on x86_64.
For the price of removing some of the rusty nails sticking out of C, I'll happily pay a few 16KB on a binary as large as clang. Plus in most or all of those cases, the optimization would still be possible even with stable uninitialized values. It just needs to be done in a different way. You might be able to get 13 of those 16KB back with a very minor amount of work.
> For the price of removing some of the rusty nails sticking out of C, I'll happily pay a few 16KB on a binary as large as clang.
Lots of LLVM users won't. The competition between LLVM and GCC is (or was) pretty brutal.
> Plus in most or all of those cases, the optimization would still be possible even with stable uninitialized values. It just needs to be done in a different way.
I doubt that's possible without justification. How? Are you familiar with all of the passes of LLVM, how they interact, and with the code patterns generated by the STL?
I trust Chris in that he didn't add the jump threading optimization for no reason, which is the main one that matters here. If he says that this occurs with "surprising frequency", sorry, but I'm going to trust him. Compiler developers are usually right about the impact of their optimizations. Submit a patch to LLVM if you like to remove it, but I highly doubt it'll go through. If it did go through, Rust (and perhaps Swift) would probably revert it, as we ensure that we don't emit UB in the front end, so losing the optimization hurts us for no reason.
The goal is to refine the semantics of C. Removing a valid optimization doesn't help that.
It's not that I think the Chris is wrong, it's that I think other optimizations related to dead code have gotten better in the last many years, and focused effort could improve them more.
I think "refining the semantics of C" is a doomed effort that we shouldn't undertake. We'll lose performance for very little benefit, if market game theory even made it possible (and it doesn't: benchmarks of C compiler performance are much more important to customers than friendliness of the C dialect). We should just be moving away from C instead, or if we must stick with C we should invest in dynamic checks for undefined behavior.
My position, by the way, is hardly uncommon among longtime C compiler developers.
Do you think C managed to get things exactly right? Or should we add more kinds of undefined behavior?
What do we do about the fact that undefined behavior actually makes it harder or impossible to write efficient code in some cases, like checking for integer overflow?
Even if your goal is performance over anything, there are a whole lot of undefined behaviors that have absolutely zero performance benefit.
> What do we do about the fact that undefined behavior actually makes it harder or impossible to write efficient code in some cases, like checking for integer overflow?
This is a good example of why we should be moving away from C. :) Signed overflow being undefined is basically necessary due to a self-inflicted wound from 1978: the fact that "int" is the easiest thing to type for the loop index when iterating over arrays. Nobody is going to go through and fix all the C code in existence to use unsigned, so we're stuck with that UB forever. The realistic solution is to start migrating away from C (and I have no illusions about how long that will take, but we'll never get there if we don't start now).
> Even if your goal is performance over anything, there are a whole lot of undefined behaviors that have absolutely zero performance benefit.
Sure. But compilers don't exploit those in practice (because compiler authors are not language lawyers for fun), so they're basically harmless in practice. They're essentially just spec bugs for the committee to fix in the next revision.
It just seems like such a waste to abandon C instead of smacking compiler/spec writers and getting them to specify behavior that was de-facto specified for decades.
> Even specifying that they have a new arbitrary value on each access would be a big improvement over the status quo.
This is approximately what LLVM describes undef as, and it indeed leads to nasal demons. This description enables a single value to both pass a bounds check, and then subsequently go out of bounds!
You're right, in certain cases it would still cause trouble, but it would be a lot fewer cases than 100%. Reading only once would be safe, and passing it to another functions would make a variable that can no longer change unexpectedly.
Its far more likely for a major compiler to exploit more undefined behavior for optimizations than for a processor with a bizarre sized byte to become popular. The major compilers already do this.
> The reality is unless you are using a formally-defined language like ML, you are relying on undefined behavior in your language.
This is not true, not using the definition of "undefined behavior" provided in the C standard.
"while ignoring that it's caused many problems already"
Really? I believe Mr. Hipp is claiming that it hasn't. Do you have evidence to the contrary?
I don't think the argument is that these cases should be ignored -- they are now corrected, after all. It is that most of these cases should be treated as low priority compared to issues that are creating observable problems.
I take kbenson to be saying that what Mr. Hipp claims isn't a problem for SQLite is or has been a problem for other projects. That is, other projects have been bitten by relying on undefined behavior (such as what a variable's initial value might be) that may have been consistent across a variety of implementations, and then suddenly wasn't. Perhaps it's safe to say that today we don't need to address these issues on this project in this language, but that's taking on technical debt and setting ourselves up for work in the future (that may be difficult to identify at that time).
That was exactly my point, and I think the alternative interpretation makes no sense in light of the portion of my comment saying "just not that they've noticed or that have affected them."
> Perhaps it's safe to say that today we don't need to address these issues on this project in this language, but that's taking on technical debt and setting ourselves up for work in the future (that may be difficult to identify at that time).
Yes, that is succinctly saying something I was just implying. Even if a current C program is verified as having absolutely no problems with any current compiler due to the way it is using undefined behavior and the compilers interpret it, it's impossible to assume it will remain in that state by the nature of the problem being examined. Undefined behavior is undefined, and thus may change. Now, any language may decide to change how something works, so every program has to deal with this at some level, but again by the nature of this problem, there's much less assurance that the problem won't be a small,subtle change, possibly in one compiler, which is missed until it's widespread.
This would be much less of an issue if there was a specific subset of C which defined most the undefined behavior which could be turned on with a flag. It would probably prevent portability in some respect, but I hope we've finally reached a place where portability is accepted as secondary to security.
I was making a general point using his statement as the impetus, not a specific point in the case of SQLite. Undefined behavior (specifically, developers relying on it being consistent) has caused many problems in the past, and will undoubtedly do so in the future. Relying on behavior which by it's definition is impossible to rely on, is not a situation I think we should be defending.
> The disagreement is not over whether or not [undefined behavior] is a problem, but rather how serious of a problem. Is it like “Emergency patch – update immediately!” or more like “We fixed a compiler warning” or is it something in between. –Hipp
In this specific case with sqlite it certainly is more of a fixed-compiler-warning. However, you could imagine some undefined behavior in SSL implementation to require emergency-patch level as well. I agree with you that undefined behavior "in most cases should be treated as low priority".
Can you suggest what that change might be related too? I, too, think it's less likely than one might think.
This discussion seems to miss a couple things. SQLite is embedded quite often, by programs written in C; would embedding a rust library and possibly runtime fix things? The parent program would still potentially have defects and those defects could impact SQLite.
SQLite is also rather mature, I do t see how you can compare a rewrite to nature code. Take OpenSSL as an example, they are t tossing it, they are fixing it, it's a much more shallow lift to fix it.
I'm all for some big rust programs to prove its case though, a mailer, a dns daemon, some sort of database. Something useful cut from whole cloth, and ideally something we have historically not done well. I don't think a rewrite is it though.
> Can you suggest what that change might be related too? I, too, think it's less likely than one might think.
Aliasing rules[1]? Compilers apparently don't agree on that now, so even if it were to coalesce into the same undefined behavior, one would have to change.
> This discussion seems to miss a couple things. SQLite is embedded quite often, by programs written in C; would embedding a rust library and possibly runtime fix things?
I'm not advocating for Rust as much as I'm advocating against C, or at least against C as it currently exists and is implemented with so much undefined behavior. I've argued elsewhere in this discussion that a special subset of C with as much undefined behavior as possible specifically defined, which could be enabled through a flag, would do wonders (even if at the expense of some portability).
> Take OpenSSL as an example, they are t tossing it, they are fixing it, it's a much more shallow lift to fix it.
It's much more shallow to review and patch it. Let's not kid ourselves that it will be fixed when they are done with it. There will likely be bugs regardless of the language used to implement a crypto library. Does that mean we should ignore when one language allows an entire class of bugs that another does not, especially when it's for a Crypto library?
Let me make my case another way. What if C nevercaught on as the dominant language, or C was defined originally in a more strict manner, and most the utilities and tools we take for granted were instead implemented in a language that had less undefined portions, allowed less undefined behavior, and thus had less bugs and security problems? Would the resultant small performance hit (due to being unable to optimize around the undefined behavior) outweigh the added stability and security, or would we have been better off? I think we would have been better off, and I think since C's still widely in use, it's not too late to make that case.
Undefined behavior is not always identifiable through static analysis. Obviously it can be checked against at runtime, but that's actually quite expensive. It would, for example, include bounds checks for everything, and overflow checks on all signed arithmetic.
That's not the worst of it: the truly intractable part is preventing use-after-free UB. The only ways to do this are (a) remove malloc from your language; (b) add a lifetime system (incompatible with all existing C libraries); (c) add a garbage collector (which most projects written in C will not accept for performance reasons).
I think that's mostly solved by changing the whole of the idea from "remove/disallow all undefined behavior" to "remove as much of the common needed undefined behavior as possible such that most programs need not really use it". Perfect is the enemy of good.
Are we arguing the same thing? If UAF is the cause of security problems, and UAF is undefined behavior now, redefining it under a special mode to mean "this is an error (whether or not the compiler enforces it)" at least clarifies the situation,and lets and included static analysis report an error as needed and capable.
That is, I'm not arguing that as much undefined behavior as possible should be made defined and possible, just that it's defined. That definition may very well be "you are not allowed to do this. Don't do this."
And once you've eliminated that, you have DoS bugs like forcing infinite loops, abandoned (referenced but unused) memory leaks, and worst-case hash table insertions. All of those are serious attacks for anything with a resource budget.
He certainly has a point, but I think he goes too far in assurances about the current sqlite3.so.
Testing "every single instruction" does not guarantee that all C-level UB has been eliminated, because some bugs can be input-dependent. For example, even if your coverage tells you that this function has been tested, it could still trigger undefined behavior for other inputs that trigger overflow:
For another example of this issue, consider the fact that data race detectors do not eliminate all data races in practice, because they only find races that show up on your test inputs.
I ask this from a position of seeking knowledge rather than being adversarial, the discussions on undefined behaviour from a C language perspective always seem muddled to me between what's undefined at the C language level such that a compiler might take advantage of it and produce results that would be unexpected looking at the code, vs. the machine code generated faithfully represents the language statements but a bug may be present when processing some external input at runtime.
If the inputs here are coming from something external to the program, such that the compiler can't know the value of x then the machine code should be faithful to the language statements, it's a bug if overflow occurs which may have other runtime implications, but the compiler isn't going to remove some chunk of code, etc. because of it.
From their testing page SQLite say they also test boundary conditions (I don't know to what granularity that is though) which which may catch something like this, which is agreeing that instruction level coverage isn't enough. They also run their tests with all the various sanitisers enabled.
Isn't this the same case in Rust, the non-release builds would need to see a suitable test case for the overflow check to cause a panic.
I guess I'm asking is this example relevant to concerns about undefined behaviour and optimising compilers vs. Rust? Since the function is input value dependent in practice isn't this more like implementation defined behaviour at runtime, in that a platform will alway provide a consistent behaviour, e.g. overflow, trap, saturate, etc.?
I've followed regehr's blog for a few years, I've read a lot about Rust, but I'm mostly working in higher level dynamic languages and don't have a lot of hands on experience with C or Rust and just wondering if I'm missing some subtlety here.
Frankly, SQLite is the exception that proves the rule.
For all the complexity of the SQL Language, and efficiency constraints SQLite needs to have, plus all the algorithms it has to implement, it still got a pretty well-defined task. It is not easy to transfer that experience to other systems. Say, your company's backend API. Completely different constraints.
It could still be the case that, had Rust existed when SQLite was being created, that it would have taken much less engineering effort. Which is the metric that matters, as given infinite manpower, you can write anything in any language.
I do agree that rewriting SQLite now, which is a very battle-tested piece of software, would probably do more harm than good. I bet people will still try, for fun if nothing else.
Wow. I was surprised that comment came from a smart, accomplished guy. It's nonsense. What he's essentially saying that the existence of compiler bugs in Rust refutes using it as a C alternative. Let me try that: the existence of hundreds of compiler bugs in C compilers over past years means it can be relied on either. CompCert had a few at spec level so we throw it out, too.
Realistically, you rewrite the components piece by piece in the new language. You make sure each compiles right with testing and review. You report any problems to the compiler team, who fixes them. Eventually, you're whole app is in the safer language without a lot of work. You might even swap them where safer one becomes reference code with other one there for any platforms not supported yet or too buggy. A diversity benefit as Hipp mentions.
With all of this, the program becomes immune to most memory & concurrency issues while being easier to maintain. Its undefined behavior will probably be a fraction of C's in number and severity. That's a net win.
Note: I'd have told him Ada/SPARK instead of Rust given it's been stomping C in embedded safety for a long time w/ lots of tooling for verification activities already there. Counters his compiler maturity argument, too.
Let me re-read it. He makes several points. The first is that they're intentionally relying on undefined behavior known to cause problems sometimes out of nowhere. He says it's OK they rely on it because it's not currently causing them problems. Relying on something impossible to rely on as kbenson worded it since that's working out so far. Reminds me of a George Carlin quip about people building villages on active volcanos then surprised when lava turns up in living room. Pretty foolish.
Next he points out Rust should reduce undefined behavior occurrences vs C while eventually having some of his own. That's correct but he neglects the biggest benefit: its safety scheme preventing many flaws found in C projects by default. Leaving this out of his comment makes it Rust vs C on undefined behavior and compiler correctness only. Bad comparison given efficient, memory safety is basically Rust's main benefit.
Next, he conflates compiler bugs with undefined behavior. They're not equivalent. One is an implementation failure to be remedied. One is a design failure to probably stay indefinitely in the language and compilers. He's falsely reframing the situation to prop up an argument.
Next, he delivers the argument: that compiler bugs mean you can't rely on Rust unless you check the machine code itself. I like that he checks machine code as that's a high-assurance recommendation with proven value. He claims there's not enough tools to get the job done and a lack of compiler diversity. The first might be true and the second usually only matters if they're implementing the same spec. Otherwise, you're getting effectively different programs you can't compare directly. Plus, most GCC, LLVM, and Rust programs are performing just fine relying on one, actively-developed compiler without machine code testing.
So, he's made some bogus claims, dismissed Rust's whole benefit package, focused discussion on machine code from buggy compilers, made claims about its verification which I lack knowledge to evaluate, and ignored field evidence that his focus area is a small problem. He's trying really hard to dismiss Rust entirely at compiler level without much to show for it. Incidentally, it takes much less writing for most of us to dismiss C on grounds of language or compiler safety. Something you don't see him doing. ;)
That said, Rust community should invest in tooling for assembly/machine level verification if he was correct in saying they don't have it. That will be important for OS and embedded where developers trust compilers very little. Past that, his comment's misdirection and level of bias deserves no charitable interpretations.
> Next, he conflates compiler bugs with undefined behavior.
Maybe Rust is just not specified for every corner case? The compiler could just do anything in such cases (e.g. what a C compiler would do -- not checking for arithmetic overflow, for example). You can then go ahead and claim it wasn't UB in Rust. But effectively it is the same, and you can't expect that Rust will specify that a compiler must check for arithmetic overflows in the future.
> Next, he delivers the argument: that compiler bugs mean you can't rely on Rust unless you check the machine code itself.
If there are more bugs in the Rust compiler than in GCC or LLVM (which I don't know, but it sounds reasonable to assume given Rust's age) then that's just a good engineer's pragmatic realism.
> Maybe Rust is just not specified for every corner case?
Even in the absence of a formal specification, if you demonstrate undefined behavior in safe code, it will be regarded as a high-severity bug and slated for correction. If it's a bug in the compiler, it will be patched. If it's a bug in the language itself, the language will be redefined to prevent that behavior in safe code and the implementation will be updated to reflect this. If these changes break existing code, then so be it: soundness fixes are an instance where the Rust developers reserve the right to break backwards compatibility. Rust takes UB seriously.
He and I both agreed UB could show up in Rust although it defaults on quite a bit of safety. Far as compiler maturity, he could use that argument but counters himself with machine code verification due to no trust in compilers. As in, what was point of being up Rust compiler quality if he doesnt trust C's either.
Only valid point he has is that a system language needs tools to produce and/or test machine code output for source equivalence. Claims Rust doesnt have that but that's outside my knowledge. I know Ada, SPARK, CompCert C, and a Java subset have methods available.
> I don't see why we shouldn't be moving to languages like Rust given the chance, as C makes it far more difficult to write safe and correct code
I agree in principal, however the timing is wrong.
Rust is a very new language (less than 6 years old), and is undeniably totally unproven. It is the latest "buzz" language, and may not be around long term... nobody knows.
What-more, a full-fledged re-write of CoreUtils in Rust is unlikely to be used by any production environment, because this new CoreUtils will also be undeniably totally unproven. This is the same growing pains LibreSSL have been experiencing, lots of gung-ho fans, but very few actual users (outside of OpenBSD/FreeBSD) - and their undertaking is arguably a lot easier, since they're just cleaning a codebase, not starting from scratch.
Average users who just consume distro's are not going to switch to a new unproven CoreUtils (even if they knew how), and distro maintainers are not going to switch until it's proven either. It will take a huge company with a huge install-base switching and testing it in production for many years before others start to feel comfortable... however this is also an enormous burden on said mega-corporation, for little-to-zero perceived benefits.
Yes, in principal, it's "safer" code, but to a mega-corp with thousands of installs, the risk is too great. New bugs, language pitfalls, behavior potentially changing, etc. Perhaps Rust dies, perhaps it's replaced with an even better alternative. It will take a LOT of time to work all this out.
Flatly, re-writing a several decade's old, matured codebase in today's flavor-of-the-week language is not a good idea. It's a waste of time and effort.
Let the languages mature more, do more systems work that doesn't involve replacing the foundation we all stand on... and maybe, in 5-10 additional years, we'll see where Rust goes.
> Flatly, re-writing a several decade's old, matured codebase in today's flavor-of-the-week language is not a good idea.
It's odd that so many in the computing industry are unwilling to move on from a language from 1978. We think of ourselves as one of the most fast-moving industries, but we have this odd reverence for early C and Unix that makes us stubborn and resistant to change. The fact is: we didn't know how to do some things properly in 1978. We know more about programming language design now. (Even Rob Pike would presumably agree—that's why he created Go!)
To be sure, we shouldn't just rewrite things in new languages for no reason. I think many segments of our industry are too fad driven. But to me the right thing is simple: Let's evaluate new technologies on their merits. Rust may well be worse than C! But if it is, let's figure out why, and say so explicitly.
> The fact is: we didn't know how to do some things properly in 1978.
Actually we did know how to make it properly, as Extended Algol in Burroughs B5000 in 1961 was being used, just to cite one example from many others that were ignored by the UNIX authors, because they didn't want to spend too much effort designing a proper compiler.
> We think of ourselves as one of the most fast-moving industries
We're fast-moving because our foundation is solid and not changing (ie. CoreUtils and gang). It's an assumption that these things "just work" with zero fuss and weirdness between systems.
We build on-top of these systems, so changing them out from underneath us all is a dramatic shift.
Perhaps Rust is the key to making these things better. I never claimed Rust is bad. I've only claimed that Rust may or may not be the right choice here, and since it's so young and unproven, we should wait before trying to re-write "all the things" in Rust. Today, Rust is a pet language... tomorrow, maybe not.
Remember, it took C many years to "catch on", and even longer to become the de facto standard for systems work. We can't rush this sort of thing... especially given the sheer magnitude of things depending on this code.
We should also be careful who actually does the re-write when the time comes. New CS grads who cannot understand the old C code and therefore feel [insert-new-hip-language-here] is better... are not the best ones to tackle this sort of endeavor. This requires deep, deep understanding of the entire package, how all the components interact, legacy behavior and the reasons behind design decisions, etc...
> We're fast-moving because our foundation is solid and not changing (ie. CoreUtils and gang).
That isn't a definition of "fast moving" that I would use. It sounds like slow moving.
Why is innovation in the core layers of the system less legitimate than innovation in social media apps? Boeing has no problem upgrading their engines every few years for better fuel efficiency. Why can they do that, whereas can't we do the same with things like GNU coreutils?
> Remember, it took C many years to "catch on", and even longer to become the de facto standard for systems work. We can't rush this sort of thing... especially given the sheer magnitude of things depending on this code.
Actually, Bell Labs had no problems with building their entire system front to back in their new unproven language C instead of using Fortran. I'm glad they went the way they did.
> That isn't a definition of "fast moving" that I would use. It sounds like slow moving.
Perhaps I didn't word it correctly.
When you want to write the next WhatsApp, you don't have to start from scratch. The OS has been taken care of for you, and you can expect it to "just work".
This solid foundation allows innovation to build on-top, at a rapid pace. If your foundation was constantly changing, you'd have to account for all sorts of weird intricacies, non-portable code, etc (like the old days).
A rapidly changing OS isn't really what you want for a production environment. In fact, you want it to be as constant as possible, so that you are free to do your work.
> Actually, Bell Labs had no problems with building their entire system front to back in their new unproven language C instead of using Fortran
You're right. But do remember it took a long time for it to propagate. Today, there's still systems not written in C (although they are in the minority). At the time, a lot of systems were written in pure assembler, and it took a long time to convince those guys that C was ready as a replacement for most tasks (and C changed dramatically in that time period).
I'm glad they went the way they did at the time -- but today with all these things built on top (financial markets, governments, big mega-corps, small ma n' pa shops, etc...), we need slower changes in order to keep the stability.
> At the time, a lot of systems were written in pure assembler, and it took a long time to convince those guys that C was ready as a replacement for most tasks (and C changed dramatically in that time period).
And they were wrong to resist C. They should have switched over sooner.
> today with all these things built on top (financial markets, governments, big mega-corps, small ma n' pa shops, etc...), we need slower changes in order to keep the stability.
Stuart Feldman, on why Makefiles insist on tabs:
"After getting myself snarled up with my first stab at Lex, I just did something simple with the pattern newline-tab. It worked, it stayed. And then a few weeks later I had a user population of about a dozen, most of them friends, and I didn't want to screw up my embedded base. The rest, sadly, is history…"
Stability is important, but eventually you do have to go back and fix things that are wrong for us to move forward.
> At the time, a lot of systems were written in pure assembler, and it took a long time to convince those guys that C was ready as a replacement for most tasks (and C changed dramatically in that time period).
Operating systems written in higher level languages go back all the way to the 60's, like Burroughs written in Extended Algol in 1961.
There were multiple other OSes written in variants of Algol and PL/I, before C was even an idea on its designers.
Only home micros were fully written in Assembly by the time C was getting used outside AT&T.
It is an urban myth spread by AT&T fanboys that C is the first system programming language.
Anyone that bothers to research the history of mainframes and operating systems will easily find documents of those systems, most of them written in higher level languages with more memory safe features than C has ever had.
Boeing's engines for passenger jets are not substantially different than they were 20 years ago, are they? coreutils also got some feature additions, fixes, and optimizations over the years.
Alupis wrote:
"Average users who just consume distro's are not going to switch to a new unproven CoreUtils (even if they knew how), and distro maintainers are not going to switch until it's proven either. It will take a huge company with a huge install-base switching and testing it in production for many years before others start to feel comfortable... however this is also an enormous burden on said mega-corporation, for little-to-zero perceived benefits."
Seems like this is a good opportunity for Mozilla to demonstrate the resilience of Rust and its ecosystem by adopting Rust as the language of choice, where possible, and deploying in-house a custom Linux system in which Rust-written components are plugged-in, as and when they are written and ready, with perhaps a full-fledged switch to Redox sometime in the future. If Rust and software written in it are pushed to their limits within Mozilla, then its not an unreasonable recommendation for it to be deployed on an even wider scale.
> deploying in-house a custom Linux system in which Rust-written components are plugged-in
I don't see Mozilla doing this, really. There's no direct benefit, and there's a nebulous future benefit for Rust.
Mozilla is using Rust components in Firefox though (as well as Servo being mostly Rust). "Adopting Rust as the language of choice" seems to be happening already -- I've heard a lot of folks enthusiastic about (re)writing in Rust. This stuff takes time, though.
>Flatly, re-writing a several decade's old, matured codebase in today's flavor-of-the-week language is not a good idea. It's a waste of time and effort.
That's the sort of thinking we'd benefit from having less. Legacy is a terrible burden.
Don't underestimate performance. I haven't seen the most recent benchmarks, but C/C++/Fortran are still unbeatable in raw speed. If you want maximum performance no matter what, these are the languages to choose, even if they sacrifice readability, maintainability or safety.
Well, rust is beating g++ in the benchmark game now. [1] (please don't take microbenchmarks seriously) Performance is complicated, rust has a couple of things going for it that will help out a lot. First, there's a pretty strong bias to use the stack. As opposed to java or scheme where stuff is heap allocated by default. Second, rust avoids the pointer aliasing problems of c and c++. When someone gets around to making a fast matrix library, it'll likely take much much less time to catch fortran than c++ took.
But again, performance is complicated. People have put tens of thousands of hours into c/c++ optimization. rust is young, so not so much time there. On the upside, rust has room to grow.
That is, in this general form, wrong. Most compiled languages can match or sometimes exceed C speeds, depending on the task at hand and the algorithm chosen. There are a lot of very inefficient C programs around - because they are badly written. More high-level languages allow the programmer to focus on speed where it matters. And in the current day and age, there should be no reason to favor raw speed vs. program correctness and safety.
You see code size and linearity is very important for anything non-trivial and hot. New languages tend to have abstractions built upon themselves making understanding and optimizing a pain.
Well, I have plenty experience of working with SBCL which directly gives you access to the assembly produced, allowing extremely good control of the code quality in hot spots. Just
(disassemble 'my-function)
would give a nice printout. There are also lots of other languages, which might be high-level, but still give you excellent code quality - I used Modula-2 a long time ago which did back then beat the resident C-compiler in generated code quality. Yes, many C compilers are very good and some new stuff not much so, but claiming in general that "C" is fast is an oversimplification.
Difficulty is relative. If you don't study modern C idioms your C code will be crap. Same goes for Rust.
Honestly using higher-level languages is the same mentality as taking a pill to magically lose weight. It's quick, but detrimental (to programmers ability) in long term.
Programming becomes easy but in the long term most people forget how algorithms and data structures work, in addition to cache mechanisms and other optimizations.
Until hardware changes drastically there's no sense rewriting everything (unless of course it's just for fun). It's better use of time to study Math and lower-level concepts instead.
That being said, most businesses will take the quick pill instead.
> Difficulty is relative. If you don't study modern C idioms your C code will be crap. Same goes for Rust.
That's not a valid argument for why better tooling can't help alleviate some of the difficulty.
> Honestly using higher-level languages is the same mentality as taking a pill to magically lose weight. It's quick, but detrimental (to programmers ability) in long term.
If that were true, the most effective programmers would only code on assembly.
> Programming becomes easy but in the long term most people forget how algorithms and data structures work, in addition to cache mechanisms and other optimizations.
Why would higher level languages obviate knowledge of any of these things? If anything, I think it would help by reducing "noise" from incidental complexity; e.g. ownership in C vs Rust.
>If that were true, the most effective programmers would only code on assembly.
No, because C is as fast as hand-coded assembly in most of the cases. The same can't be said of any high level language in comparison to C (save for C++ and Fortran).
> The same can't be said of any high level language
Rust is basically a metal level systems language. Just like C. It has a nicer type system but it compiles to more or less the same thing in the end. Unless you make an error such as using freed memory, in which case it doesn't compile while the equivalent C code would often compile to a very high performance foot-gun. The Rust safety is paid for, for the most part, at compile time unlike the usual high level languages like java/C#/python/... which pay for it at runtime.
> No, because C is as fast as hand-coded assembly in most of the cases.
Most hand-coded assembly is pretty slow. C isn't any pinnacle of performance either.
C programs are compiled to some non-existent abstract machine that ignores the real variations in memory architecture and CPU implementations. The compiled binary can't adapt at runtime.
I don't know of any C implementations that take advantage of runtime information for optimization purposes. So you end up with generated code with a lot of redundant computation, tests, branches and pointer dereferences (such as function pointer dereference that always refers to same address), just because they might be necessary with some input -- input that wasn't the case this time.
A single mispredicted branch is expensive. Say branch mispredict takes 15 cycles. That's enough to do up to 480 (32*15) 32-bit floating point operations on a single core. Ignoring runtime information takes us pretty far from anything you can call optimal code.
The current crop of compilers are also pretty bad at vectorizing anything complicated. Those cases it can be pretty trivial to beat the compiler by 2-10x, in some cases even 40x+ if your vectorization can also eliminate a lot of unpredictable branches.
I disagree. When programming C you have to remember so much stuff and be so carful about even the simplest things, this mental effirt takes from your resources, and I dont care how good a C programmer you are.
The mental capacity saved can be invested in higher level design issues that gone get you a lot more in the long run.
I am not arguing that higher-level languages are worse and that everything should be coded in C. Each problem requires different tools for its solution. And to that end, high-level concepts should be done in a language-agnostic manner.
However, lower-level understanding is paramount. For instance, and most-importantly today, taking advantage of multi-threading requires understanding of cache-coherence, memory-alignments, et cetera.
Therefore, while a proper serial algorithm today may be correct its scalability is going to be limited without lower-level understanding. Although, I'll admit that can be built into the higher-level languages (like concurrency in Clojure for instance), but I personally prefer to understand what is going on rather than blissful ignorance :)
It will be slower, though. C is king in raw performance, and in critical areas where that does matter you don't really have a choice but to write it "by hand" (aka in C).
I'm not entirely clear on your point. Are you suggesting that writing in C as opposed to Rust is somehow morally cleansing? Or are you suggesting that having higher level abstractions is somehow ruinous to ability?
The evidence is pretty clear that programming in C does not confer enough skill to prevent disastrous mistakes despite its near-hardware level of abstraction, so I'd really like to know what you're saying here.
I don't think I worded my original comment in the best way. However, I find that the people who make statements like "C makes it far more difficult to write safe and correct code" haven't touched the language in years if ever. Hence, no wonder it will be difficult for them because they have not practiced or been exposed to good C idioms. Correct code in C is actually easy since the language is very simple. Due to this simplicity, unit-testing is very easy yet lots of legacy codebases don't use these testing strategies to their potential. If they did lots of issues would be corrected.
Although I cannot refute there are additional memory-related issues to be aware of in C, it is precisely such awareness that make someone a better programmer because its how the underlying hardware works. For instance, parallelizing algorithms must take into account cache-alignment boundaries and so on.
Abstractions are good. However, too much abstraction is bad. Just as too much of anything is a bad thing. When people rely solely on such abstractions, it is in fact ruinous to ability.
That being said, higher-level languages certainly have their place. But I firmly believe that C is a high-enough abstraction for systems programming and with proper idioms and testing strategies, it can be just as safe as the plethora of garbage-collected languages out there in the wild.
> Correct code in C is actually easy since the language is very simple.
1. No, it's not. Witness the various arguments that have happened on HN over the years concerning whether the imprecise language of the spec makes some idiom undefined behavior or not.
2. The evidence over the past 35 years has not shown that correct code is "simple" in C.
> For instance, parallelizing algorithms must take into account cache-alignment boundaries and so on.
…which Rust makes you just aware of as C does.
> But I firmly believe that C is a high-enough abstraction for systems programming and with proper idioms and testing strategies, it can be just as safe as the plethora of garbage-collected languages out there in the wild.
1. Rust isn't garbage collected.
2. Your firm belief is contradicted by 35 years' worth of memory safety track records of large-scale software written in C.
"The evidence over the past 35 years has not shown that correct code is 'simple' in C."
"Your firm belief is contradicted by 35 years' worth of memory safety track records of large-scale software written in C."
Keep in mind, of those 35 years, it only makes sense to consider the last decade-onward (or so) in comparison. While the language hasn't changed a whole lot, programming methodologies certainly have. How many of those libraries in question were written within the last 5 or 10 years? What is the code coverage on them? Et cetera.
I am not trying to argue that C is perfect and everything should be written in C. That would be crazy! There are definitely issues with C no doubt--but so many people say C is dangerous and difficult when in fact they don't even practice using the language. This is no different than practicing Violin on a daily basis and saying Cello is difficult so people should stop playing Cello (not a perfect analogy but you know what I mean).
The other comments is just me ranting at how the majority of programmers don't understand important computer-science concepts because of their removal from such concepts by higher-level languages. And sometimes its important to have a higher level abstraction to help solve certain problems. But, people have a tendency to become lazy and therefore they lose the fundamentals over time.
> Keep in mind, of those 35 years, it only makes sense to consider the last decade-onward (or so) in comparison. While the language hasn't changed a whole lot, programming methodologies certainly have.
The last decade has seen an explosion of modern C++ code using best practices that routinely exhibits the same memory safety issues. C is worse.
> There are definitely issues with C no doubt--but so many people say C is dangerous and difficult when in fact they don't even practice using the language.
I practice C all the time and I think it's dangerous and difficult. I've seen so many brilliant programmers accidentally create game-over RCEs via use after free, for example.
Is it? You're telling me that all code in all browsers have been re-written into C++11 (or C++14) with best practices? I don't believe you. At a minimum, I'm going to need some documentation before I believe that.
[Edit: I'm not trying to pull a No True Scotsman here. I just doubt that browsers have been completely rewritten into modern C++, or with anything approaching best practices. I've seen how long old code lives to believe it without some supporting evidence.]
Most exploits tend to be in new code (contrary to popular belief), which in all modern browsers is written in modern C++. The WTF (Blink/WebKit) and the MFBT (Firefox) are state-of-the-art template libraries; you are free to search for those libraries and verify for yourself. New C++11 features such as rvalue references do nothing to avoid memory safety problems; in fact, they make them worse, since "use-after-move" is now a problem whereas it wasn't before.
I know it's hard to believe, but C++ is not memory safe, old C++ or modern C++, in theory or in practice. The new C++ features do effectively nothing to change this. As far as use-after-free goes, C++ basically adds safety over C in two places: (1) reference counting is easier to use and is easier to get right; (2) smart pointers are arguably somewhat less likely to get freed before accessed again due to the destructor rules (though I think (2) may not be true in practice). Browsers have been making use of these two features for a very long time.
Bringing up modern C++ here is "no true Scotsman" unless you can point to a specific C++11 feature that browsers are not using that is a comprehensive solution to the use-after-free vulnerabilities they suffer from. There is no such feature I am aware of.
No, I wasn't asserting that there is some magic C++ feature that the browsers aren't using. "Most exploits tend to be in new code" was the piece of your argument that I was missing.
I am certainly biased as I enjoy procedural languages for their simplicity. So many constructs in the others. As the saying goes, C++ is my favorite 4 languages. I hope that doesn't become the fate of Rust. Go-Lang seems to get that part properly then that's probably debatable.
> "Correct code in C is actually easy since the language is very simple."
I broke back into this account just to share my marvel at the singly most wrong sentence in human history. The Mona Lisa of being incorrect about programming.
I audit code for security vulnerabilities in several languages professionally. You are correct about the benefits of unit testing, which is language agnostic, but C is not and cannot be Easy To Be Correct, ever, no matter what.
The Rust standard library does assume that allocation succeeds. The language itself knows nothing about the heap, and so you can write allocators that do whatever you wish. Side note: on most Linux distros, overcommit is on, and so malloc will basically always succeed; the OOM killer will kill your program before you'd get a failure. I am less knowledgeable about OSX and Windows.
Holy crap, I'd never have imagined that... A design that assumes allocation always succeeds flies in the face of decades of safety/security-critical coding wisdom. I strongly recommend the team revisit and change that somehow to account for failures, NULL's, whatever. Actually, same anywhere a failure-prone, esp hardware, resource is acquired. C apps can handle this issue so Rust should as well if it's to replace them.
EDIT: Thanks to replies for clarification that it's just one allocator, aborts, and others are available. Still feel weird about it but that's better.
The standard allocator will abort on oom error. For applications which need to be tolerant to oom errors, you need to use a different allocator.
Most programs can't tolerate oom errors, though, and it would be absolutely unreasonable for every function in the standard library that might perform an allocation to return an error that has to be handled by every programmer all the time.
EDIT: C's solution is to make it very easy to ignore oom errors, so most programmers just don't handle oom errors. Rust's solution is much better.
Every function in the C++ standard library reliably communicates allocation failure to the application without relying on aborting the whole program. If Rust can do it, C++ can do it too. Rust got itself into this trap by eschewing exceptions.
Maybe avoid using judgmental language like calling a standard allocator that aborts on oom a "trap." The Rust team made conscious design choices in full awareness of the trade-offs. Moreover, Rust actually does have thread unwinding, and even the ability to catch an unwinding thread, so it is not true that the Rust standard library could not have unwound on oom.
Rust's standard library just isn't designed for writing applications that need to survive oom errors. That's fine, its not designed for a number of other applications which Rust the language is well-suited for either (operating systems, for example). Its designed for the majority use case, because life is full of trade offs.
This is absurd. Most C applications do not need to persist through OOM errors, and has people have repeatedly reiterated, it is totally possible to write a Rust program that persists through OOM errors.
Note that Rust, the language, is perfectly capable of handling memory allocation failures, it's just the standard library that makes the assumption. Embedded environments wouldn't use the standard library for numerous reasons anyway, and the "core" library uses no allocation at all. That said, IIRC handling of failure in the standard library will happen eventually, I believe there's just no consensus yet on the best way to do it.
At least for the nightly builds, you can specify a closure to run when OOM happens. The only restriction is that it cannot return, but you can "recover" from panics in the nightly standard library as well. The thinking was, I think, that in most cases the default behavior of abort is the correct behavior. Recovering from a failed malloc is really only relevant in large allocations, and there are multiple paths (including directly using __rust_allocate) to recovery in that case.
Furthermore, keep in mind that the _pure language_ Rust has no concept of dynamic allocation (apart from "language elements" like __rust_allocate, which are kind of like a pre-linker).
Side note: on most Linux distros, overcommit is on, and so malloc will basically always succeed; the OOM killer will kill your program before you'd get a failure
I suppose this coreutils impl doesn't do it but one could perhaps use rust w/o its standard library and use strictly some libc instead. Would probably still be a net win.
Writing Rust code interfacing libc is a rather frustrating experience though. Many libc interfaces use constructs that Rust really wants you to avoid (for very good reasons), things such as global mutable variables (errno and for callback functions without context pointers) or union types. I know about at least one cargo crate attempting to "Rustify" libc with some level of success.
None the less, this is what I typically end up doing when writing Rust code as much of th Rust standard library is simply not ready for "serious" usage.
Personally, I rather enjoy writing clean Rust wrappers for gross C APIs. Doing so does require a solid knowledge of C, and familiarity with a small bag of Rust tricks.
In my experience, the Rust standard library is extremely convenient and it handles corner cases very well. There are definitely still holes, but that's what "cargo add $CRATE_NAME" is for.
My biggest annoyance with Rust (and it's not a huge one) is that if I'm doing something off the beaten path, I'm probably going to need to wrap a C library or two that nobody has wrapped yet. There's a lot of great stuff on crates.io, but it's only a minuscule fraction of the total C ecosystem.
There is no way to select over a range of TcpStream objects. There is no stable way to select over a range of mpsc channels. There is no way to access a TcpStream in a non-blocking manner. There is no support whatsoever for UNIX signals. I could go on.
I don't miss the things you mentioned very much, and I think it's an exaggeration to say that these things are what makes Rust's standard library not usable for anything serious. But anyway:
> There is no way to select over a range of TcpStream objects.
mio
> There is no stable way to select over a range of mpsc channels.
Yes, this is annoying. We should stabilize MPSC select. You can use BurntSushi's chan for now though.
> There is no way to access a TcpStream in a non-blocking manner.
mio basically supports this, no?
> There is no support whatsoever for UNIX signals.
Do many languages have good support for this? The C and C++ standard libraries don't; that's part of POSIX. Signals interact very badly with garbage collectors, so I can't imagine many languages have this.
So I think our disagreement is semantic. When you say the Rust standard library is "unusable", you mean that you usually need Cargo packages in addition to what the standard library provides in order to write programs in Rust. That is true, but that's just a design difference between Rust and some other languages, like Go. For a systems language like Rust, I think that focusing on having an excellent package manager instead of having a super-comprehensive standard library was the right call. The standard library is very well designed for what it does (IMHO), and I credit the community-based library stabilization process for that.
As for Go and signals, I think it's pretty debatable as to whether you can have "excellent support" for signals without the ability to write a true signal handler. Note that this is not a fault of Go and is pretty much inherent to any garbage collected language. I suspect the Rust community would not be particularly happy with an implementation of signals that had hardwired sending to MPSC channels. Even having MPSC channels in the library at all is somewhat controversial...
Here is a talk I recorded about Linux systems programming with Rust. Quality is only 720p, but maybe you'll still find it useful. The speaker compares solutions in C with equivalent ones in Rust, and what kinds of advantages / safety features you gain:
...faces problems solely due to the nature of the language its written in.
Really, this is true of any programming language and any involved enough program. It's just that with C and security-critical programs, there's some unfortunate concordance between the errors you want to avoid and the errors that are harder to avoid.
> Many GNU, Linux and other utils are pretty awesome, and obviously some effort has been spent in the past to port them to Windows. However, those projects are either old, abandoned, hosted on CVS, written in platform-specific C, etc.
I have seen such a paragraph in another projects README, IIRC a Go rewrite of standard utilities. I do not understand why a project would be obsolete because it's on CVS or is old. CVS is simpler than Git, albeit less capable. I, for one prefer it over Git for this reason, and others may do so too. Why would the end user care?
And why would we care about the age of a programme if it works?
Now, that said, the authors need not justify anything, they are free to do whatever they want, and I guess it's fun to code this stuff. I tried this just to play with Golang when it was 1.1.
And lastly, that Makefile is really a bunch of shell scripts, and some common environment variables. There is no real dependency tracking in it, and it is easier to maintain a bunch of shell scripts than a seriously ugly and complex Makefile like this. N.b. that when I say dependency tracking I mean dependencies among input and output files of processing commands, not tasks. I guess Cargo would know how to do that, and how to not build if the build artefact is already there. It's a useless use of make. And that makefile is very GNU-specific, not complying with a project whose purpose is to be cross-platform. Also, I guess, tho I'm really unfamiliar with Rust and Cargo, if that makefile was removed, maybe on Windows they'd be able to drop development dependencies on Cygwin or Msys.
Open source is useless if the majority of people in the open community -- for whatever reason -- can't build it.
If a person can't build the thing, they effectively have no power to contribute to changing it. If they have no power to contribute to changing it, it's... spiritually, missing the most essential parts of open source.
This is the fundamental issue driving comments like the quote above.
You can tell me that not having a dependency management system, not having a sane version control system, using "some" C compiler (sans a matrix of tests specifying a range of expected good compilers), arbitrarily fine-grained platform specificness meaning an average OSS contributor can never reasonably test their changes against all targets..... all of these things can be "worked around". But at some point, the litany of issues -- some of which take a new contributor dozens of hours to work around -- becomes a simply overwhelming barrier to contribution.
It's time to admit that a foundation of workarounds in FOSS development processes at the very core of our systems is a problem.
Re-writing it in language $x may not be the solution, but it's certainly understandable that there's a widespread desire for simply getting better toolchains underneath our most basic essential systems.
This project does not really improve upon the situation, as it uses what it is to replace for building, and a very complex Makefile where there's no need for one.
I've said nothing about dependency management as in fetching code that the project depends on. I'm talking about compilation dependencies, i.e. file a.o depends on a.c, a.h, b.c and b.h. Make is for this:
a.o: a.c a.h b.c b.h
cc -o ${.TARGET} ${.ALLSRC}
But nowhere in the projects Makefile the rules are in this fashion. They are like shell aliases. What I wanted to say is that the Makefile could be replaced with a bunch of shell scripts that would be easier to use and maintain.
For the rest, it seems that we mostly agree, tho I do not think that CVS is not sane, it's perfectly usable.
On line 4 ifneq appears which is a GNU make feature. .ALLSRC is a BSD feature. It's a fine GNU makefile. Also you should look at Cargo.toml to see better what is going on.
The issue is not whether projects are old, it's whether they're well-maintained. Please see the Core Infrastructure Initiative's Best Practices Badge project for one attempt at measuring the "health" of an open source project.
I could see preferring svn to git because of the simpler model, but cvs? No thanks, a vcs without atomic commits is not much better than snapshot archives, maybe worse actually.
a) SVN seems daunting and complex, tho I didn't ever dive into it. CVS is so simple and easy, a half-arsed programmer like me can actually understand it. Things like git and mercurial are way more complex.
b) RCS is real handy for single files, e.g. a free-standing text file or shell script. But when the thing grows up, it is very easy to integrate the fileset into a CVS repo preserving it's history: move the ,v files to $CVSROOT/$MODULE/.
c) The repository model of CVS is as transparent as it gets.
d) The keywords like $Id$ are really useful.
e.g. I keep my system configuration in "~/Checkouts/system-config", and I have a script that cp's the files to appropriate locations using a map file. When I'm not sure if the active config is not up to date, I can verify very easily. And I can be sure that dirty files won't be active as long as I don't expressly copy them. I know that SVN has this too, but I find CVS easier to use in general.
I guess for fast paced, very active development, yes CVS is sub-par, but for personal stuff, or for something that is patched say at most two-three times a month, it's O.K. It boils down to personal preference.
SVN is way simpler at the interface than CVS, you should really look into it. SVN is a spiritual successor to CVS, and is trivially easy for a CVS user to pick up. We switched from CVS to SVN at work several years ago and everyone was happy with the change.
there are two things that are nice about svn if you use it just like cvs.
1, atomic commits. I edit ten files, that's one checkin, rather than the per file checkins of cvs. On a low volume project, not a big advantage. if you've ever conflicted on a bigger project with cvs, it can be kind of a pain to resolve. seeing the whole commit of the other guy is helpful. If you don't run into this more than, say, monthly, it's not worth it.
2. offline diffs. svn has a whole copy of the repo, so you can compare history even if the central repo is down, or you're working from the beach. This one is pretty nice regardless.
svn is a pretty nice upgrade, if you're working with a distributed team.
All appreciated, but nowadays my biggest repo is my emacs config, some tens of thousand SLOCs, about 2000-2500 of which I authored (I commit packages I use too, no elpa). And my repositories are local, i.e. they are in /var/cvsrepos. Other than that, I admit that SVN is indeed superior.
I believe that one should use the best tool for the case, not the overall best tool in every case.
That said, I'll give a look at SVN. I can consider switch when I have the time if it is easy to import from RCS, because I do use it a lot here and there, mostly for plain text documents. I do not like maintaining unrelated things in a single repository.
> If you're learning for the sake of learning, i'd dig into git.
Well, nearly everything moved to git. I've used it for a while, mostly commit/pull/clone. I actually moved to CVS from git :) It's a bit like perl, git, you either need to be an expert at it, or else you can get going, but at the end you create a mess.
> [...] it's probably not worth the overhead. Sounds like you have a good, efficient setup.
Mostly, yes, but mostly because I don't really need that much more than recording my history, and occasionally looking at what I did in the past and more rarely working on short-lived branches. But there are a couple tools that if I ever code them up, I will make opensource. Tho if I ever do, and they take on, I'll use a 'just send a patch on the mailing list' approach. I believe it's easier to deal with. No flamewars on VCSes :)
Agreed; I'd go even farther to say that old is a feature. Now, as a side effect of its age, it may not conform to modern-day best practices and styles… which might make it harder to fix newly discovered bugs or add new features.
Heck, the increased difficulty in creating feature bloat could even be considered a feature in and of itself, too!
Git makes it easier to accept contributions and code review from a larger developer community, thereby allowing building a higher-quality product.
(Incidentally, while I think that part of the reason why Git is this way is due to design differences from CVS, this isn't essential to it being true. Git is also better than Monotone or Fossil or Bazaar, despite being much closer in design, because it has network effects that those other systems don't.)
> Git makes it easier to accept contributions and code review from a larger developer community, thereby allowing building a higher-quality product.
It is very easy to contribute if you know git. But you can mess it up if you don't know. I made a two-line bugfix patch to flycheck, and heck, I was nearly pasting the patch into a comment in issues because I didn't know anything about how to make a pull request on github and how to commit so that the puller would be happy. I didn't want to do sth. embarassing and spent two hours reading and reading how to submit my patch properly. And the whole fix was made and tested in about two minutes. If only I was able to submit a dumb patch, I wouldn't have to care if they used git, cvs, or tarballs and quilt.
I think it's easily fair to say that git is by far the superior option to CVS, and preferring CVS over git is objectively incorrect. I think that you would be wise to invest the time in learning how to use git correctly.
Objective truth is second to factual productivity. I did use git. It's powerful, but more than I need. CVS takes ten pages of reading to grok. And then it's out of my way. I couldn't learn git in years, and there's something new every other day, it gets in my way. And I don't have time to chase a command line on stack overflow, I need that time for what I'm actually doing.
I can't speak for other projects, but for anything I maintain on GitHub, "Hi, I don't know how to use git, but here's a patch file" is a perfectly acceptable bug report! I'm not claiming that git makes it 100% easy, just that it makes it more easy.
That said, GitHub does let you use a web-based editor to create a git branch behind the scenes and open a pull request, with no VCS client required at all. So that's a point in GitHub's favor. (Again, it's not inherent to git, and a hypothetical CVSHub could do that, but GitHub exists.)
I am excited to see this as I was working on a similar project last year (rewriting the BSD userland in Rust), but it's from pre-1.0 Rust so not really as idiomatic as what is coming out of the Redox project.
From the README: "These are based on BSD coreutils rather than GNU coreutils as these tools should be as minimal as possible."
If all of these utilities are running in userspace, per Redox's microkernel architecture, then what is the advantage of intentionally not making them feature-rich?
It has the advantages that come from following the UNIX style. To learn more, I recommend reading this short paper by Rob Pike and Brian Kernighan, published in 1984: http://harmful.cat-v.org/cat-v/unix_prog_design.pdf
Rewriting coreutils is neat, but a project I'd really look forward would be a strict POSIX base expanded with warnings or errors on valid but risky constructs e.g. echo -n. Even more so if it included a shell (with static analysis of useless uses and dangerous patterns). That would make writing cross-shell scripts much easier.
POSIX defines echo as only taking string parameters and no options, but notes that behaviour facing `-n` is implementation-defined.
BSD and GNU echo implement `echo -n` as not printing a trailing newline, but `echo` commonly calls to a shell builtin which may or may not follow that behaviour (and may switch behaviour depending on whether the shell is in "sh mode" or not), so `echo -n` could print `-n<newline>` or nothing whatsoever (empty string and suppressed newline) depending on the utils set, the shell, and the shell's runmode.
GNU echo also supports -e, -E, --version and --help options, and much like -n shells and other utils set may or may not support these.
For instance on my machine (OSX 10.11)
* zsh (builtin) interprets -e, -E and -n as options (but not version or help)
* bash (builtin) also does, except when invoked as sh in which case it does not and all parameters are literal (this may also apply to zsh)
* dash (builtin) interprets -n, but none of the others. bash note may also apply to it.
* BSD echo interprets -n but will print -e, -E, version and help literally
* GNU echo interprets all of the above
So if you use echo with any non-literal parameter, or with one of the parameters listed above, in a script you distribute to un-controlled third-parties as an sh script (rather than e.g. a bash or zsh script specifically) you will suffer from portability issues.
And that's just for measly trivial echo (and incidentally why you should always use printf rather than echo in scripts you try to make portable).
The only operating systems that are certified to conform to POSIX are old Unix operating systems. GNU has always been non-POSIX. And honestly, a POSIX implementation would be more trouble than it's worth.
The point of coreutils is to have utilities that make up the ability to write scripts for and interact with your operating system, right? Well, what operating system?? A POSIX-compliant one? Or just a mostly-POSIX-compliant one? Or one with POSIX extensions? How would your utilities know the difference? How would the OS know how to deal with these utilities? Would the user know the difference?
Ultimately, each platform has quirks, and it is up to the developer to port and test their script or application to a platform and make any necessary changes. This extends to far more than just POSIX compliance.
That's not a question of certification or extensions, I'm talking about being able to write portable scripts. Due to the interaction between its definition and its extensions, echo is a prime example of being impossible to use portably (except in the very restricted case of only literal strings without escapes which don't start with a -).
> Ultimately, each platform has quirks, and it is up to the developer to port and test their script or application to a platform and make any necessary changes.
That is not humanly feasible and that's why specifications exists. You can't "port and test" your script to a platform which doesn't even exist yet, but if you follow the specification and the platform implements it (assuming it does so correctly) your scripts will run.
Ok, I think I understand the confusion now. You seem to be of the impression that shell scripts are like bytecode executed in a virtual machine. That is the only way I know of that you could write an application for a platform that doesn't exist and expect it to work. But even for that to work, it would need to be the same VM, and bytecode generated by & for it, or there's still no guarantee it will work.
Of course, you can already write a shell script for a particular shell, distribute that shell to that system, and depend on the shell to properly execute your script [by using internal functions only]. But that defeats the whole purpose of following a standard like POSIX, or caring at all how any given platform's 'echo' program works.
Bottom line, though: two independent implementations of a standard provide no guarantee they will work together. Practice over a couple decades shows this to be the case.
> Ok, I think I understand the confusion now. You seem to be of the impression that shell scripts are like bytecode executed in a virtual machine.
What in bloody hell are you talking about?
> Of course, you can already write a shell script for a particular shell
Which is irrelevant to my comment as that's not what portability means.
> But that defeats the whole purpose of following a standard like POSIX
Exactly.
> Bottom line, though: two independent implementations of a standard provide no guarantee they will work together.
If following a standard can't ensure your program can work on two different implementations of the standard, you don't have a standard you have decorated toilet paper.
Which is more or less what the "commands and utilities" part of POSIX is.
ECHO(1) FreeBSD General Commands Manual ECHO(1)
NAME
echo — write arguments to the standard output
For this definition, -n should mean merely a sequence of two bytes to be written to stdout. Why not just use printf instead? It is way more flexible and powerfull, and "printf x" always prints "{'x', 0}";.
The POSIX spec is basically the minimal intersection of all commercial UNIX distros of its day. It defines a minimal system that happened to be what everybody already had implemented. For the most part you are allowed to add features to a POSIX base to make a real OS, since that's what everybody had.
This is also why the old Windows POSIX subsystem was so useless. It implemented only the minimal amount needed to check off the box on a feature list, and none of the stuff you need to actually make a system usable.
That said, POSIX has a fair bit of braindamage baked in and people can be excused for ignoring the worst parts and instead doing the right thing.
Note that the -n option as well as the effect of
`\c' are implementation-defined in IEEE Std
1003.1-2001 (``POSIX.1'') as amended by Cor. 1-2002.
-- http://www.freebsd.org/cgi/man.cgi?echo
So there's probably some system (or shell) out there where -n doesn't work?
If you want to suppress non-complaint behavior, set the POSIXLY_CORRECT environment variable. You can also specify the version of POSIX to comply with, because the standards aren't always standard.
I would hope your cross-shell scripts are also conforming to a certain shell script language, and are also resetting all environment factors which change the function of various commands. Not that endianness is ever a worry with a shell script ............
(Also note that POSIX supports printf, which you can use to insert any character string you like, basically)
The OP is "luring" people with a small amount of information i.e. "bait," and other people will "bite," as a fish would, to learn the rest of the info.
Yeah, I know the phrase. What I find difficult to believe is that three random people responded with the same response on his simple 2 line comment.
I mean we see tons of comments with "small amount of information"/cryptic references that might similarly puzzle people in HN, but not equally many "I'll bite".
"I'll bite" is a pretty common response when someone says something without elaboration that you sense they really want to follow up on. Maybe its a regional thing, but I'm having a bit of a problem thinking of another phrase.
Though I like ideas behind Rust but the code is awful. If you look at the code you'll see a lot of misterious symbols (as in Perl) and strange constructs like wrap, unwrap, Arc etc. They make the code less readable.
Also Rust doesn't have exceptions so you have to wrap almost any function call with let/match/Ok/Err. Ugly.
Are they really starting a new OS level thread for every directory found? Looks like an easy way to exhaust system resources to me. Also I don't see the code that would collect error information if if the thread panics.
>Also Rust doesn't have exceptions so you have to wrap almost any function call with let/match/Ok/Err. Ugly.
respectfully,
This is pure nonsense.
First of all rust does not have runtime and AFAIK for providing exception you should have runtime to manage stack.
Second not every language should be like high-level languages, it is not the rule to be like C#,Java,Python,etc. I use a lot of them for my work when I need simple thing to do, but rust designed to do low-level stuff, and I cannot understand how having not having exception makes a language ugly (specially when you code in lowlevel).
> I mean if you have to write that match construct around every function call
Not quite.
1. you're supposed to handle errors around function calls which can fail, which is a strict subset of "every function call"
2. rust has a number of higher-order constructs to facilitate that handling[0][1][2], not just raw `match` statements or expressions.
That aside, for rust explicit error handling is considered a feature both at the language level (allows for less runtime requirements and much stronger guarantees — check out exception-safe C++ for what happens when low-level meets exceptions) and at the user level (by forcing a conscious and explicit decision, whether it's crashing the system, handling the error or passing the ball upwards)
> the code quickly gets bloated, doesn't it?
Does C code quickly get bloated? Because you're also supposed to check for error codes after each function call which can fail, and C doesn't provide much abstractive power to mitigate that.
Rust has macros, so boilerplate can be swept up pretty tidily. In the case of propagating up errors, there is the try! macro that encapsulates a match that returns early with the error
Comment OP's stance is valid. "Magic" in this context are keywords or symbols that are not immediately clear to programmers that don't work in rust. One of the reasons golang is so successful is that there is very little magic in the syntax, and even when there is it's fairly easy to grok (an example would be the `go` keyword).
FWIW I also share their opinion that rust is unapproachable.
> One of the reasons golang is so successful is that there is very little magic in the syntax, and even when there is it's fairly easy to grok (an example would be the `go` keyword).
Do you have a specific symbol you would like to change in Rust, and what would you like to change it to?
The only example I've seen (in a child comment to yours) is effectively a complaint that Rust has lifetimes and Go doesn't, which is effectively saying "you should have a garbage collector like Go does", which is an argument against a fundamental design decision of Rust. If you want to argue that you should always use a garbage collector, argue that directly instead of making vague negative comparisons between Rust's and Go's syntax.
I'm definitely a Rust fanboy, but the single-quote syntax for lifetime annotations can be irritating. Several editors I've used automatically insert a second quote to match, and I am frequently unable to disable that behavior without losing all paired delimiter insertion (like for parentheses or braces).
It's a minor quibble to be sure, but it's the only language symbol that bothers me when writing Rust. Not sure what I'd suggest replacing it with...backtick, maybe? Pipe? @? ~?
There aren't many other special characters on a QWERTY board that aren't already used in Rust. Which I think gets at one of the stumbling blocks that I see in the various Rust syntax bikesheds among those who haven't worked in the language. It's just alien until you've used it a bit, especially if you're writing a lot in pseudocode-y dynamic languages.
We had a big debate about it back in the day, and ' won as it's about as visually lightweight as you can get. I think the other characters you suggested would invite even more Perl comparisons.
Because Rust has lifetimes and Go doesn't, so Go doesn't need syntax for them. If you want to argue that Rust should use a garbage collector like Golang does (which entails arguing that everybody who is using Rust is wrong for not wanting an always-on GC) I'm happy to have that argument, but say so explicitly.
Just wrt the let/match/Ok/Err == ugly comment: I don't know Rust very well and I haven't looked very closely at the code in question, but why would Rust force you to wrap almost any function call with let/match/Ok/Err, though?
I am assuming the issue at hand here is when you call function that returns a Result (if that's the name of the parent of Ok/Err)?
Couldn't one let the return values 'flow up'? For example: if a function starts returning Results and you don't handle it on the level above, you start returning Results too...
Also, I suppose there's ways of flattening results to avoid having nested Results (I would guess and_then does that based on its signature?)?
In this scenario, you could argue that one now replaces match with map at every function call where you do not actually handle the Err and that this is ugly too, but the alternative is hoping that the caller happens to have read your source code (or have checked exceptions), which is, arguably (to be diplomatic), the case for unchecked-exceptions. Then what happens if you change the code and so on and so on...
This way lets you achieve something similar to checked-exceptions, and in addition to gain composability of Err (and the like), without adding another language construct (which a language without exceptions would have to do).
Got a great deal of experience in other languages that does this, and it's worked out pretty good so far for at least for me once you get the hang of it (i.e. functional coding).
Where I am sitting, that would be a wise choice for Rust, at least if I am understanding it correctly, as it tries to be a safer alternative to other system languages. However, arguably (:), it is OK in dynamic languages or similar that aims trades in correctness for conciseness and, arguable (again :), speed of development to simply have unchecked-exceptions.
Also, it is possible I misunderstood your comment and/or how that code was ugly, in which case I hope the downvotes/replies won't be too harsh :)
Yeah, codedokode is wrong. Rust lets you keep returning Results as you described. You don't even need to actually write a match statement yourself: there's a built in try! macro, which will do an early return from a function if a function call returns an Err value. And soon there will be a new ? operator which does this inline so you don't even need the macro.
What you end up with are much more flexible "exceptions" that don't need a lot of extra compiler support.
The CoreUtils sort is amazingly efficient, it can sort files nearly as big as your hard drive without matching memory requirements.
With all these node.js / Go / Rust CoreUtils implementations I'm still hoping for one of them to actually match the efficiency of the original implementation.
I wrote the initial sort about a year ago (w/o looking at GNU version's source). I just looked at the current version, and confirmed uutil's sort does not do any external sorting, meaning the entire input is stored in memory. Maybe someone can confirm that GNU's sort uses temporary file storage to do something like an external merge sort to reduce memory usage?
What are the licensing implications for this kind of work? I assume the authors used GNU coreutils as more than just inspiration. They probably read all the original code and reused some of the solutions (obviously ported to Rust).
Shouldn't the derivative work still be covered by the GPL?
I've contributed to this project, and yeah, this was a major concern for me while I was doing it. If it shared a license with GNU coreutils, then code sharing would be free and the project would be built much faster because I could just use coreutils's algorithms. As it is, I haven't done any real, hard work for it because frankly, I won't want to re-invent that wheel.
The project isn't terribly far along. I wonder if just starting a GPLed fork and building on that instead wouldn't be a better idea.
I've submitted an issue asking to shift to a GPL license.
My general perspective on code I write that isn't for work - it has to be GPL. I refuse to have my code be yoinked by random corporations for their profit without having the code shared downstream.
Like many other programmers, I avoid GPL'd code like the plague. The idea that you can own an idea seems ridiculous to me, and it feels unjust to sue "random corporations" for using ideas that you published. We're standing on the shoulders of giants, and I see the GPL as a tumor that's draining the world's resources.
It's not about owning an idea, like a patent is. It's about receiving payment for someone else using the code I developed. The price for using my code is that you also release your source code. If that price is too high, you can't use my code. The hard part is defining "use my code" in the context of reading my source, then using those ideas in your own project. At what point does it change from gathering an idea to just copying my code?
All intellectual property (including copyright) is based on the fundamental premise that legal entitlement can be granted to ideas or "creations of the intellect."
I find your reasoning very strange, if anything, GPL ensures that the 'ideas' are instead made available for anyone to use, provided that they in turn do the same.
If anything would be a 'tumor' by your reasoning around ideas, it would be proprietary software, which is what GPL prevents.
If your code is GPLed you impose a cost on someone else to use it. Whether or not that cost is morally justified is not the point of the objection. I am making no judgment as to the righteousness of the goal of using a GPL or similar license.
By using a GPL you strictly limit who can consume your code to those who are willing to be bound by your views on what is right.
In this way (and limited explicitly to the scope of topics I am directly addressing in this comment), GPL licensing is similar to proprietary licensing: limits are imposed upon consumers of your code based on your licensing decision.
Again, I am not making a moral judgment or normative statement about what is good or right when it comes to code reuse or copyright in general.
Licenses which do not impose requirements upon consumers of the product are more free in the sense of allowing more behaviors, or placing fewer restrictions.
To me the key distinction is one of code freedom vs human freedom. Is the code free from malicious tampering or is the human free to make decisions and take action unhampered by another?
If I may quote the late Milton Friedman, "Heaven preserve us from the sincere reformer who knows what's good for you and by heaven he's going to make you do it whether you want to or not. That's when the greatest harm is done."
P.S. This is not an observation about you, because I don't know you, but I think we can observe the amount of hedging and defensive posturing I've taken by default in this response is illustrative of the type of response I can expect, based on observing similar discussions across various fora online. I hope that in this case that defensiveness was not necessary (:
What you describe seems to me exactly what John Locke said about Freedom of nature vs Freedom of people.
Freedom of nature is to be under no other restraint but the law of nature. Freedom of people under government is to be under no restraint apart from standing rules to live by that are common to everyone in the society
A license which enforce a share-and-share-alike system is to me much closer to freedom than if it allowed everyone to do what he likes, to live as he pleases, and not to be tied by any laws.
The basic division is negative rights vs positive rights, or at least that's how I frame it internally. A negative right is, very simplified, a right to be left alone. A positive right is a right to a specific item/privilege/behavior.
I think the wikipedia article[0] sums it up better than I can:
> positive rights usually oblige action, whereas negative rights usually oblige inaction.
There's some nuance between "do whatever you want" and "do whatever you want so long as it doesn't fuck with someone's day."
I'd really recommend the wikipedia article for a better treatment than I have time to give the discussion right now.
I'm actually not taking a position, but trying to help provide nuance in a debate that often teems with zealotry, and do so in a reasonable manner to provoke thought rather than vitriol.
My unexamined default inclination would be toward a BSD or MIT style license than a GPL. You may infer what you like about my default leanings in other areas if you like.
I've not given enough thorough thought toward software copyright to have a strong opinion as to the morally right position. This is not to undercut my comment and the observations I made above, but to make clear my standing which is ambivalent. I can certainly understand both sides of the argument and respect the position people speak from for either (to be explicit, the poles of the continuum being public domain on one end, with MIT/BSD licenses leaning far that way and hardcore copyleft on the other with GPL being the canonical example toward that end).
Toward the Friedman quote, that was not an attempt at appeal to authority to discredit the idea of copyleft, but an observation about the type of person (or at least the general tone of argument) who most vocally defends GPL licensing. If anything it was an appeal to authority to please be thoughtful about how to respond to my comment, and more to the point of why I included it, because it popped into my mind while responding.
This is a long-winded non-answer so far.
Really, what I was trying to do was to emphasize that there are costs to a GPL license, and that a reasonable person may think of those costs as such and not as freedoms. If you look at my response to a sibling comment, I'm thinking about this in terms of negative vs positive rights.
Looking at Friedman, I would certainly characterize him as someone who leans toward defining negative rights as freedom/liberty and positive rights as limitations thereupon. I would also characterize myself as leaning that way (so maybe I'm attributing my own views onto him). This is not to imply you are a hypocrite or declare that it is right or in any way argue against a GPL copyright in general, but just an observation.
With copyright (unlike patents), when I offer some code under some conditions, to first order I haven't subtracted anything from you -- you're free to ignore my code, maybe write your own. This isn't so different from offering you a banana for a buck, or a free bench in the park provided it carries a memorial plaque to my grandfather.
I used to default to the MIT license, basically saying "just don't be a dick about the authorship or the non-warranty". Over the years many companies, like Apple, incorporated this sort of software into platforms and products that circumscribe my freedom -- products that I "own" but can't legally control. The GPL (especially v3) is a compact of people building an alternative to this locked-down world. Closed platforms seem to me a much greater, and increasing, encroachment on our freedom in practice than the inability to legally incorporate GPL code into closed software. Both of these restrictions, closed-only platforms and open-only code, are things someone is choosing to offer and you are choosing whether or not to use -- I don't see a difference re: positive or negative rights, technically. (But I'm not super-interested in rights-centric political theory.)
I have been thinking that I should dual license my code under GPL and a license which forbids the enforcement of copyright and patents. People could then chose the one which one align closer to their philosophy.
Something like: Recipient of this software can do what ever they want so long they agree to a contract that bounds them to never do enforcement, in legal and technical form, for copyright and patents.
A complete ban on lawsuits for copyright infringement and patents, including the use of DRM technology that manage and enforces copyright and patents. It is also not limited to just the work that I distribute, but as a contract would cover everything the recipient create or has created, be indefinite, and with harsh fines if broken. I do not think a single person or company who refuse to use GPL would instead accept that deal. Will you be the first person to accept such deal and forever stop creating tumors on the worlds resources by putting software under proprietary licenses?
sorry, but is the "random corporaions" the tumor: they just use the code, maybe improve it, but keep that private, and the improvement is lost for the rest of the people.
Every written line of code is a liability. By maintaining an internal fork of an opensource you are actually putting a burden on your business and no longer receive the latest patches for free.
>the improvement is lost for the rest of the people.
You are forgetting that businesses sell a product. You can get the improvements by buying their product.
If the business saves money because they used opensource technology this should translate into lower costs and it follows that they will reduce prices (after all they have to compete with free as in beer!) for their customers.
If you don't want them to earn money from your MIT code you can always try to erode their profits by implementing similar improvements.
I agree with this, though I don't feel strongly about it. I would choose GPL for my own projects, but other projects using BSD/etc won't prevent me from contributing to them.
The concept of a corporation using your code for their profit without giving back is mostly just an imagination. Todays reality has Microsoft and Apple sharing part of their codebase under permissive licenses. If a corporation uses your code, chance is, they want to give back any modifications to reduce their maintenance load. If GPL prevents 1 out of 10 corporations from misusing of your code, it also probably prevents 8 out of 10 corporations of using (and donating back) your code.
Well, there is a huge problem - copyleft does not only require you to share changes to the source you used and modified, but also to unrelated source of your own. Which kills it for most corporations. Some of their code might contain trade secrets or just isn't freely relicenseable by them as it might involve third party rights. Which is why LGPL mostly works for corporations but GPL not.
Can you elaborate on this? What exactly do you mean with unrelated source and might you give a concrete example what a company could NOT do if this code would be GPL'ed.
The readline library for example. If I want to use it, I have to GPL my whole program. I probably might not have modified readline at all. Or any other GPL software which gets somehow linked together with my code. Just watch the controversy whether ZFS kernel modules may be delivered as pre-compiled binaries (sources are fully available). The output of compilers and parsers which are GPLed, also are implicitly covered by the GPL, unless there is a clear exemption, like with GCC.
While GPL software which is used as an entity, like Linux itself or applications like Emacs are fine, any closer contact to the software you are trying to sell is problematic.
It is in the interests of a minority of programmers, themselves a tiny minority of the general public. Let's not get completely overblown about the stakes here.
The only reason I use the GPL is to safeguard the general public's rights to modify and distribute my software. If anything, I'd argue that releasing GPL software works against the author as it makes making money off it substantially more difficult.
The general public still retains the right to modify and distribute your software with a license such as MIT. Your software does not cease to exist once megacorp uses it for a product. The guarantee of GPL is that megacorp is now obligated to share their modifications (but practically, to go find another library or write their own).
You're missing the forest for a single tree. The obligation to redistribute changes under the same license is not the only reason to use the GPL. The license's safeguarding of end-users' right to fix and modify GPLed software is a far more important reason, and is the reason why I and many people like me choose the GPL.
I don't want to restrict developers and corps. After all, I am a developer who wants control over my hard work. However, I also don't want to erode the rights of end-users - of whom I also am. I don't want to end up in a situation where I can't fix a bug in my own software because a corporation wont let me.
> The concept of a corporation using your code for their profit without giving back is mostly just an imagination.
One of the authors of the Python requests HTTP library has called out Uber for using Python, and almost certainly using requests, and not paying any money
- did they modify it? - you can submit changes back only if you changed anything
- as it is server-side software, GPL had not changed much here, you have to provide the source of your program only to those who got delivery of that program, but that is not covering running on your servers.
- none of the licenses discussed require to donate money back, only source code changes.
So, yes, I think more companies should fund open source efforts when using open source software. The attention to OpenSSL and its security problems also showed how little funding that development got despite running most of the https on the internet. That indeed is scandalous. But this is covered by none of the current popular licenses.
> as it is server-side software, GPL had not changed much here, you have to provide the source of your program only to those who got delivery of that program, but that is not covering running on your servers.
This is exactly the reason the Affero GPL was invented.
And, yes, there is no legal requirement to donate money. However it shows bad faith and it shows that "a corporation using your code for their profit without giving back" is a reality.
No. It's not a derivative work. Creating a compatible piece of software is not copying. Even if you've seen the original code. If you're not literally copying and pasting code, it's fine.
Copyright protects the code itself from being copied, but the ideas, abstractions, overall design, and even individual APIs are not eligible for copyright protection
I knew someone would post this. Yes, that's technically correct, but 1) this could still go to the supreme court, 2) it's now back at the lower court for a ruling on whether copying APIs is "fair use" with all bets on the answer being "yes". So although I wouldn't dismiss the distinction, unprotected vs. fair use make no practical difference, we can copy freely either way.
I don't think that is accurate. If your work is derived from the source - even if you don't replicate the original - you're infringing copyright. That's why clean room design is so important.
Back in the BIOS clone days, they had someone read the IBM BIOS code, and write a detailed specification from it. They had someone else never look at the IBM BIOS, read the specification, and write code to implement it. (This was the "clean room" approach - the IBM BIOS was never in the room of the implementers.)
But that's still "derived" in the sense that it implements the same functionality. But it's perfectly legal. So "derived" doesn't mean "implements the exact same functionality as the other, and we examined it in detail to make sure".
Why did they do the clean room approach? So that IBM could never claim that they had copied the IBM BIOS, even by re-typing rather than electronically copying.
Well, if you're re-implementing it in Rust instead of in C, you're not copying it, either. You're making a completely new implementation. (Rust doesn't take C code as valid syntax, so far as I know, so typing in the same code from memory wouldn't get you anywhere.)
No, a "derived work" is a legal concept, it means "you copied the original". Being inspired by, or even deliberately designing for API compatibility with an original work is not "derivation" nor is it "copying", as far as copyright law is concerned. Remember only certain portions of the original are eligible for copyright in the first place, e.g. APIs are not, module or class structure is not, ideas are not.
It's important to remember that copyright law protects works from being copied, not from being read. Clean room is a legal tactic used against an aggressive adversary, it's not something that's at all necessary or appropriate in the general case.
Reading GPL'ed code and reimplementing it makes a good case for why the rewrite should also be GPL'ed. I don't know if they did this here, but if so, they should GPL their version as well.
For GNU Octave, we stress very strongly that anyone who has read Matlab's source code is ineligible to contribute to Octave. This is because, should it ever come down to it, we want to be able to ascertain that our implementation is completely original, because nobody has read Matlab's source code. In a similar vein, I'm still waiting[1] for someone to implement the medcouple for Python's statsmodels, because I cannot do it myself.
Nope, just as reading non-GPL'd code and then writing a GPL'd version is legitimate, so too is reading GPL'd code and writing a non-GPL'd version. As long as you're not literally copying and pasting the code.
While you're free to invent any contribution rules you like for Octave, there's really no need for such drastic measures. It might give you a nice piece of mind, but it's not legally necessary - it's perfectly possible for someone who has read the Matlab source code to contribute to your project without copying anything. You'd just rather not have to think about it, which a is pragmatic, but heavy-handed restriction.
Of course I can read something and not copy it. I can read a book and not copy it. I can listen to a song and not copy it. I can look at art and not copy it. And I can read source code and not copy it.
It's not surprising that your lawyer implied otherwise, as its "best practice" to guard against every feasible risk, no matter how unlikely. Understand that your lawyer is protecting you against a hyper-zealous misinterpretation of the law, not the actual law.
Correct me if I'm wrong, but it seems like the original point here is not whether someone read the code and then contributed, but whether their contribution might (inadvertently or otherwise) contain Matlab intellectual property because they have been exposed to it. It seems a reasonable safeguard to prohibit those who have seen the source code from potentially contributing in a troublesome manner.
You're quite right about exposure to code: it's prudent to be hyper-cautious. But the original point of contention was the suggestion that it's legally required to be this cautious, that people simply don't have the right to read something and then write something similar which isn't a copy - and that is false.
> This is because, should it ever come down to it, we want to be able to ascertain that our implementation is completely original
I appreciate that this is how it works today, but isn't that a completely outrageous idea? A well read, well travelled person will have seen countless things that will influence their future behaviours. It is not uncommon to completely forget a particular source of inspiration (sometimes we falsely attribute to someone else, and even other times we attribute it to ourselves!)
Yes, it is. This is an excessive level of caution which does not reflect the permissiveness of actual copyright law. However, there is the concept of "unintentional copying", which is effectively what you're describing, and which is still considered an infringement under copyright law, however it's very, very unlikely that such an argument would be used in court with regards to software, and even more unlikely that a jury would be convinced by it. It's really more of an argument for subjective and creative endeavours, such as hearing a music clip or seeing a logo design or piece of art, which leaves a strong impression.
Lawyers tend to fuel the FUD around this with "best practice" concepts such as "clean room", which is a drastic overreaction. It's akin to saying "if you want to be an author you should never read any books, in case you accidentally copy one of them". Sigh.
A further complication for us is that the Mathworks is rich and amoral while GNU Octave is tiny and idealistic. If we ever become large enough for them to notice us, we have to make sure that we never give them the slightest argument in their favour. Should our code ever end up looking similar (similar variable names, similar structure), we have to be able to say that it's purely coincidental.
That's a fine reason for going the extra mile to protect yourself against a hostile adversary but understand that it really is "the extra mile". While it may be prudent in your situation, it is heavy-handed and unnecessary in general.
Well, that's a very large assumption, and it is a big part of that answer.
I have sent in a few PRs to this project, and I have never done more than maybe glance at the source of coreutils, and it was for unrelated reasons. can't speak to the regular contributors, though.
This is very very weak grounds for any sort of lawsuit.
But if having glanced at GPL source code prevents implementing similar functionalities in an entirely different language, that's a pretty darn strong argument for me to never look at GPL code again.
Your lawyers are peddling FUD because they make money that way. "Hey, it looks like you need another legal agreement, can't be too safe!" The reality is that an "unintentional copying" claim against source code makes for a very weak lawsuit and it's close to unimaginable that such a case would even make it into a court room.
You're free to read whatever you like. Don't let anybody tell you otherwise.
You're giving a lot of strongly-worded advice/opinions on legal issues in this thread. Are you a lawyer? Can you point to any case-law to back up what you're saying?
I'm not a lawyer, and I don't really know who's right in this thread, but I'd find any citations you have really interesting to read.
Here's an crazy thought: when lawyers are in court, one of them is always wrong. Imagine that. A lawyer being wrong. Mind blown. Is your lawyer a software engineer? If not, he doesn't know anything! He has literally no idea. He can't even comprehend software. It would melt his brain.
So you ask, how can I the holder of meagre so-called "Computer Science PhD", comprehend the the holy (and unspeakable) knowledge of those who have spent an unimaginable three years studying law? Who am I to question our very gods? Well, sir, I give you "The Wikipedia":
Perhaps we should have it destroyed for spreading the "unspeakable" knowledge, rightfully known only by our lawyer-priests and spoken amongst them in their own tongue.
Wow... No one said lawyers can't be wrong, but the law is complex, and if you haven't studied it, it's a pretty good bet you don't really understand all the details. It's pretty funny to see you belittle lawyers in the same paragraph you say any lawyer's mind would melt by the complexity of software. Having a PhD in CS doesn't make you an expert in the law, the economy, or any other field unrelated to CS.
I was going to respond to the wikipedia article, but I'm realizing that would be a waste of my time. Good luck with your crusade against the almighty lawyer-priests.
OT, but that one side loses doesn't mean they're wrong, just like if one side loses a battle doesn't mean they don't know how to fight. It might just mean that the other side was better; or got lucky.
> Your lawyers are peddling FUD because they make money that way.
No, the SFLC works pro-bono.
> You're free to read whatever you like.
I wish that were true, but many current laws say otherwise. You do not sound like you are aware of those laws, so I take it you're not a lawyer. You're just hoping the world is as free as you say it is. I wish it were too, but we have to be pragmatic and work in the world we have while we strive for the world we want.
Sigh... I feel like I'm not getting through to you here. Yes your layer works pro-bono but the rest of the time he works for money, and his entire conception of legal advice revolves around that. What's good for the goose is good for the gander: Your lawyer isn't paid to understand the law. He's paid to protect you from other lawyers. And other lawyers peddle FUD, so your lawyer has to protect you against FUD. Of course he's going to advise you to protect yourself! And if you've got an aggressive adversary, then you probably should.
But you don't need to protect yourself. You're legally allowed to read stuff and write something similar. Just don't copy it.
> I wish that were true, but many current laws say otherwise
I'm going to disregard this statement (and it's conspicuous lack of citations) because you're not a lawyer :)
> You're just hoping the world is as free as you say it is.
No, I'm just reading Wikipedia:
Clean room design is usually employed as best practice,
but not strictly required by law.
Can you guarantee that you will never write a piece of software under a non-GPL license for the rest of your life? I certainly can't and I suspect that few programmers other than RMS can.
It might. I'm hedging because I can't even asset that I've _never_ looked at it, I've been doing C off and on for a very long time. But regardless, my two or three patches aren't the worry here. Or rather, every patch is, but if it came into question, they'd be easily removed.
Nope. Reading something does not make any future work you do derivative. Discouraging people from reading and understanding the work of others is a particularly bad idea.
I'm not sure picking a program where the portion that does any actual work is 3-6 lines in length is a good reference for whether they are copying code/algorithms. There's so little of substance to actually do differently, the chance of them looking similar in that respect if fairly high.
You know that whoami didn't originate with GNU, right?
The GNU coreutils are themselves rewrites of earlier BSD tools. The Rust version AND the GNU whoami.c both look similar to the BSD original https://github.com/weiss/original-bsd/blob/master/old/whoami.... Hell, the rust version is clearly a lot closer to the BSD original than to the GNU clone.
Honestly if you're going to have an opinion on this issue, you should try to educate yourself on all of the Unix that predates GNU.
No it really doesn't. Code which implements the same API is going to look pretty similar, for example ReactOS looks a hell of a lot like Windows. But it's not been copied.
IMO it's a little more complex: there's the question of copying interfaces, and the question of reverse engineering the implementation behind them. Many man pages are written such that it's clear how you should implement the logic so that all flags, etc. are interpreted in a well-defined and consistent manner with the original. But it's a billion times easier to just check out the source code for some of the tools and rewrite. Much easier, but possibly a different legal situation.
Well there's the rub - once you see the source code, it's tough to say you weren't influenced by it. In for-profit endeavors this kind of thing is typically done by having two separate groups of people, one that sees the competing product and writes detailed descriptions of behavior, and one that never sees the product, only the product of the first group. If you're careful about this and go to pains to keep the groups strictly separated, you're in the clear. If you've read the source code and go and write very similar source code, you're considered tainted from an intellectual property perspective. It doesn't prove you copied them, but you certainly lose a lot of moral high ground. In this case, I doubt many of the copyright owners are going to care, let alone actually pursue legal action. But seeing the source code is legally dangerous at times. I am not a lawyer.
We're talking copyright here, not patents. If I read copyrighted C code, and rewrite it in Rust, I don't think a copyright claim can touch you, no matter how similar they are.
Note well: IANAL. This is my understanding of copyright law, not legal advice.
IANAL also. But it's important to distinguish a copy (which is just more or less copying what you read in C) vs a derivative work (a work that could not have been done or would have been done noticeably differently in the absence of the parent work existing). Just because you read Harry Potter 5 years ago and only now bother to write a fanfic in French using your memory of that world as a base, maybe keeping most names or subtly changing them a bit, does not mean your new work isn't derivative. It probably shouldn't be derivative given how much of our creative culture is remix upon remix, but hey, that's a much more extreme position that gets close to abolishing copyright entirely.
Code is just tricky though and metaphors for books and other things often break down easily... API / module substitution copyright seems silly, but maybe only to people who understand programming or who can grasp the metaphor that copyrighting an interface is like saying any book that uses numbered chapters (instead of custom named chapters) is a derivative work of the first book that used numbered chapters. Some code "could be but one way", naming included / irrelevant (I think a lot of SO code is like this, there are very few ways to glue together certain bits of code to do small thing X in context C), some of it is more like performance art, some of it is just almost pure math, and some of it works as a whole to solve one particularly hard problem which in itself has business value, just as music has business value by solving the problem of being appealing to listen to so that people buy it. Copyright law seems ill-equipped to handle it. Strategically though, I'd go with what the lawyers advise -- a lawsuit that I win can still be more costly than just playing it safe to begin with.
> you're considered tainted from an intellectual property perspective.
No you're not. The law has exactly zero to say about this. The idea of "clean room" development is nothing more than a legal tactic used to ward off potential lawsuits from an aggressive adversary. It's in no way necessary to do this.
That case is still ongoing. Right now they are fighting it out in a district court, trying to decide who will be on the jury, and whether or not statutory damages should be decided.
> Does anyone have more solid info on whether the GPL would apply in this case?
It would if they did indeed read GPL'ed code to implement this. The SFLC has advised us to make sure we do not read Matlab code when implementing Octave code. The only thing that is known to legally work is clean-room reverse engineering. It's ok to read independently-written specifications of how the software works and reimplement that. Reading the software itself and reimplementing it constitutes a strong case for derivative work.
Porting a program from one language to another would seem to be a pretty clear example of a derivative work. In this case, the original license (GPL) requires that any derivative works be offered under the same license.
This is a terrific project idea, but I agree that it should likely be under the GPL. Even if it is not legally derivative, it would be nice to preserve the GPL.
Sometimes the maximising the likelihood of a set of components being used is more important than potential licensing constraints. In this particular case, if they're not GPL, they're more likely to be used for various *BSD systems and in a variety of embedded contexts.
Improving the general security and reliability of all systems seems like it might potentially be a more valuable goal.
> In this particular case, if they're not GPL, they're more likely to be used for various *BSD systems and in a variety of embedded contexts.
This is a bit of a bugbear. The most common way to use the coreutils is through standard Unix pipes, which does not create a derivative work. I don't know of anyone who has found the copyleft of the coreutils prevents them from doing anything they would like to do. The situation with busybox and Linux is different, as the coupling there was much tighter, and without it we would not have OpenWrt.
I understand and agree with your particular assertions about copyleft and unix pipes, etc. but disagree that it's a "bugbear".
I know from experience some organizations are perfectly willing to contribute changes back upstream on MIT/BSD-licensed components, but avoid GPL components simply because of the additional constraints and potential liability concerns that have to be dealt with. Apple is a perfect example given the additional conditions of the GPLv3.
I'm with you there. Plus, mobile and embedded means lots of stuff gets integrated tightly with hardware. Those people will not risk hardware or interface secrets in firmware being released under GPL. So, having BSD'd stuff for them to use in stuff we have to buy is a nice quality improvement for us. Even Stallman admits it's better for things standardized between proprietary and FOSS like codecs where they surely won't use GPL stuff.
Rust is a fine language, no doubt. What worries me is the exact rewrite of the C code using the unsafe construct.
Also a little out of topic; Can more light be shed by fellow HNers on the debugging tools for rust, the debugging experiences from the security research point of view?
It looks like there are 134 instances of 'unsafe', in 23,000 lines of Rust. And it looks like a lot of that unsafe is to FFI into libc.
Rust works with GDB, so you end up debugging like anything else. IDE integration is being actively worked on, and sorta-kinda works in my understanding.
What is the roadmap on getting rid of the need for libc?
Given how terrible libc is, I personally would make that a high priority, though I guess in Linux you can't even start up a process without libc (maybe that is a misunderstanding?), which makes the situation less clean, but at least you could get to a point where you never call back into it after entry into main.
Realistically? Never. If you're running on a UNIX system, libc is your only portable interface to the system. You can use the syscall layer on some NIX-like platforms (such as Linux), but it will leave you to replicate a lot of the work that it already does.
On Solaris, as one example, there is no stable syscall layer -- libc is your only interface to the system. There's good reason for that too; on Solaris, libc is updated with security and performance improvements on a regular basis for each new hardware generation and other improvements in the operating system. It automatically accounts for things that might cause pipeline stalls (think memcpy, etc.) in newer processor generations, and so on.
Now with that said, what you could advocate is that libc, etc. be rewritten in RUST, but exposed via the "C" ABI convention that RUST provides. It wouldn't be as great as native RUST, but it would still be an improvement.
There are no compelling arguments for getting rid of libc that I've heard yet for platforms where libc is well-maintained. On Solaris, there are strict compatibility guarantees for every interface, so applications can assume changes in libc will never break them as long as the interface they're using lists an appropriate stability level in the manual page.
A world where every language implements its own version of libc seems like it would increase the number of defects, not decrease them. We could of course argue that those implementations might be better than libc, but I think whatever possible benefit there might be there is lost in the likelihood that each language will have its own flaws in its implementation.
I believe the best view of an operating system is as a fully-integrated set of components. Everything from the kernel up to the system libraries should look and act consistently, and that's impractical to do if every component is viewed as interchangeable. Integration from top to bottom in an OS stack can produce incredibly great results and provide unparalleled reliability, availability, and performance.
You don't need a C standard library to run programs on Linux (nor any other system).
My guess is that the Rust team consider that relying on a well tested C codebase is safer than writing tons of unsafe platform-specific code, to replace the functionality offered by libc.
Anyway, I think the Redo project do have a Rust standard library that calls directly the OS and doesn’t need a C library, since they are writing a 100% Rust based stack.
> My guess is that the Rust team consider that relying on a well tested C codebase is safer than writing tons of unsafe platform-specific code, to replace the functionality offered by libc.
I'm copying that to my notes to use as a clear and concise explanation to people who want to work 'around' libc.
Do you suggest they should use raw system calls? On most systems your options are either calling standard C functions or doing raw calls (not portable).
And what about systems that don't have a stable interface below dynamically linking that platform's system-provided libc? Just let every program break when the system updates?
Ah. Well, right now, the standard library relies on a libc of some kind. If you use only core, then there's no reliance on a libc.
There has been some interest in making it easy to have stdlib without a libc; and some refactorings to std that may make it easier to do so, but those haven't come to fruition just yet.
True. I taught myself C using vim + gcc under Linux. For those wondering what exactly you learn:
1. You learn the command line interface to your compiler, which is invaluable and something you'll have to learn at some point, even if you start off on an IDE.
2. Similarly you learn the command line interface to the compiler's support tools like make, linker, debugger, profiler, source code revision control, grep, strings and so on, and more importantly how the whole process of 'write-compile-execute-debug' cycle is done.
I learned how to program without an IDE, and I'm a pretty big fan of it. And strongly typed functional languages with type inference tend to be really easy to write and refactor without needing specialized IDE tasks for the job.
That being said, nowadays I use an IDE because it is extremely helpful to have autocomplete (which is okay with Racer+Vim, but kinda hacky) as well as the hover for type information (name of types, type signature, etc.). Without the hover information, I usually end up doing ridiculous things like writing bogus explicit types to see what type an undocumented function from a library will return after compiling (Is it Option? or Result?). That is really annoying.
Here are the features I like in an IDE that make be very productive:
- Autocompletion
- Mass rename
- Source formatting
- Integrated debugger interface with breakpoint insertion and overlying of state on source
- Integrated VCS control
- Automated deploy
- Error display
- Automatic importing of modules
- Source cleanup (Automatic loop transformation)
- Automatically building my project
That's just a few things that I like an IDE to have. Some provide even more features that I like.
I understand that there are other tools that do the tasks better, maybe even faster, but that's not what I want. I want to be able to learn one thing, and learn how to use all of it's features well.
Error display: tmux, terminator or iTerm2 split planes
Rust has pretty good tool support in vim and Atom, for example (can't speak for the rest). We don't have stellar IDE support yet, but it's on the to-do list: https://www.rust-lang.org/ides.html
Personally, I don't miss IDEs. I like the approach of small, single-purpose, composable command-line utilities more.
As I said, I don't want to learn how to use 5+ tools. I want to be able to install one thing, have everything work out of the box, and learn how to use their uniform and standard way of doing things.
I acknowledge that learning lots of tools just to start using a new language is a bit much.
However, the reason why so many people in this thread persist in suggesting that these tools are worth learning is that turns out most of these tools plug-and-play with whatever new language you feel like learning 6 months or 2 years or more from now.
Vim is one of the more popular editors to use in conjunction with this plug-and-play philosophy; if you have a change of heart and want to take a dive into learning to use Vim as such, I'd recommend Vim as an IDE[1], which addresses a lot of the points you brought up in your list of things you wish an IDE had.
The problem with composable tools is that some things are really hard to compose and separate out, and have to be integrated: syntax highlighting, autocompletion and interactive debuggers are really hard to treat as separate programs.
You probably noticed how useless racer is on the command line: type code in your text editor and then when you want a completion, you go to your CLI and type "racer myfile.rs row,column" and there are your completions! :)
Jokes aside: it's clearly superior (and intended) as an integrated tool than as a command line tool.
I'd argue that a debugger can offer the same kind of power being an integrated debugger as the (admittedly silly) racer example.
> Error display: tmux, terminator or iTerm2 split planes
Split panes are a poor substitute for actual error display integration; highlighting the errors in the source-code viewer, combined with a cross-linked list of errors is far more useful.
Yeah. I'm a vim guy myself, but I totally understand the appeal of IDEs. Luckily Rust is a well-defined language (unlike C and C++), so building a nice IDE shouldn't be too arduous.
The last two items are definitely achievable. As a saying goes, "vim is my editor, Linux is my IDE." I'd rather learn a handful of tools that will last me my career and be valuable in many different circumstances in and out of work and more formal "programming" than learn a new IDE per language and/or a new Master IDE for multiple languages every few years.
Autocomplete is a godsend, I don't type nearly as much as I hit my arrow keys+tab. Whenever I have to write anything outside of ObjC (in Xcode) or Java (in Android Studio), I get annoyed very quickly.
This is pretty common, and the advantages of Rust won't overcome people being inherently lazy (not in a bad way... lazy in this sense is really more efficient).
People love C# for that very reason, once they get used to Visual Studio, convincing them to switch to something with very limited IDE support is a tough argument. Most people have better things to do with their life. Rust will get there though, just give them time.
I write C# at work using VS, and am working on getting omnisharp running so I can use VIM here instead. I have fully converted from being an IDE person to a VIMMER. So many helpful things, and keyboard binds that I can't believe I ever lived without. VsVim just isn't the same, and cannot be used in VS2008 which we unfortunately use most of the time.
I tried to use Rust briefly but found that discovery was very difficult without autocomplete - I'm used to Swift's SourceKit, which is quite good if it hasn't crashed.
It is interesting no one suggested Rust project internal ide support. They are developing something called Oracle.This is basically everything IDE needs (autocomplete, reference finding, error checking, etc). Oracle haven't finished yet.But after Oracle being ready, I think every editor/IDE out there can provide best experience for rust by simply calling a bunch of API which is very simple thing to do compare to writing their own AST and autocomplete.
About the project, not being GPL is serious bummer for me. I don't want to GNU project isolate more than this and anyone other who has simple understanding of politics behind the scene will understand GNU as foundation have done a lot of good things for programmer/developer/hackers community.By far more than any other player in this area.
And since I have worked on Glibc for a while and I wanted to use rust just for experiment,I was seriously considering to work on this project if it were GPLv3.
I would, but the last time I installed atom and tried to start it the base text editor, without any added plugins, took 3 minutes to start and used ~600mb of system memory.
In comparison, Eclipse starts in under a minute and uses about half that on my system.
Atom when I tried it was much to large for my system. I might look at it again.
If they're not doing a 'clean room' rewrite, I'd agree. Otherwise? That's their choice, and it's one I'm glad to see. This could be quite useful for OSes that aren't GPL'd, like the BSDs, or Redox.
Sure, but the FSF and the GNU project were reactions to proprietary software becoming more common. Free software wasn't recognised as such until Stallman saw the issue and decided to act. Saying that "free software existed before the FSF" is ignoring the fact that free software would've died almost entirely without the FSF and GNU.
Some people worry about the file size (which should be similar), but I worry about the speed. How much faster can we grep or sed with a Rust compiled grep vs a C compiled grep?
Likely not as fast as with a grep written in C, but FreeBSD's grep is (was?) pretty slow compared to the GNU version and nobody really cares (read: it's probably fast enough).
There shouldn't be any theoretical reason that a Rust grep should be slower than a C grep. In practice, a rust grep would be a new code base, and the existing grep has seen a lot of development over years.
The rationale talks about easy portability to Windows, but I'm skeptical - the underlying API and filesystem is different enough that this is likely to cause problems. '\' vs '/' to start with.
Supported, just but it's a second class citizen, for example, no tab completion of directories on the cmd prompt when using forward slash. Some commands will not except it;
e.g. type c:\home\foo.txt -> cats it out
type c:/home/foo.txt -> 'The syntax of this command is incorrect.'
whereas cd will except either forward or backslash
I write a lot of single source Windows/OSX/Linux utilities and the nice thing about Windows is that back and forward slashes are interchangeable so you can just write for POSIX.
Most languages (including Rust), already support path libraries that abstract this away from you. As long as you use these libraries properly, you should never have these issues.
I am not an expert, but the general difference, in my understanding, is that the BSD coreutils is significantly "lighter", both in features and in code size.
(on Windows use MinGW/MSYS or Cygwin make and make sure you have rustc in PATH)
This made me chuckle - if you have MSYS installed, which already comes with a windows port of coreutils, why would you want to use it to build a windows port of coreutils?
Because you don't trust the memory safety of programs written in C. Or because you think the coreutils would progress faster with a language with modern semantics like Rust.
Complete rubbish: Many GNU, Linux and other utils are pretty awesome, and obviously some effort has been spent in the past to port them to Windows. However, those projects are either old, abandoned, hosted on CVS, written in platform-specific C, etc. Rust provides a good, platform-agnostic way of writing systems utils that are easy to compile anywhere, and this is as good a way as any to try and learn it.
Cygwin has a complete, up-to-date port of the Coreutils for Windows.
The POSIX + C language provides a a good, platform-agnostic way of writing systems utils that are easy to compile anywhere.
Is the cygwin port just some different build files, or is it genuinely different code? A Rust codebase probably wouldn't (in most cases) need to be modified to run on different supported platforms.
I'm not going to comment about cygwin (because its build process is not as transparent) but instead about msys2, the spiritual successor to cygwin. See for yourself: https://github.com/Alexpux/MSYS2-packages/blob/master/coreut... is the `pacman` style `PKGBUILD` used to build the `coreutils` package.
That looks pretty clean, but they're still applying looks like 5 patches which would need to be maintained separately. It's certainly not as bad as a full port, but it's still not the same codebase. Rust's cross-platform efforts would (I think?) obviate that process.
But, nevertheless, it's not exactly a port of Coreutils to Windows per se.
For instance, let's take the "cp" utility.
A Windows port "cp" should be able to copy Windows-specific file attributes. Does the Cygwin "cp" do that? How about NTFS forks?
I don't believe that you can write some portable code in Rust, and have a "cp" utility which correctly handles every quirk of every OS's file structure.
Well, that would be a statically linked Rust version vs a dynamically linked C version. Either of those languages can produce binaries of either size. Statically link a MUSL libc and it'll be bigger than 6kb, dynamically link the rust code and use the system allocator instead of jemalloc and it'll be smaller than 2MB. (Or rather, it should be, I haven't literally tried it. The smallest known Rust executable is 151 bytes.)
That's what I wanted to hear, could really the mentioned in Rust implemented coreutils be compiled and linked equivalently to C or not and can we see the resulting sizes? Not "if I don't use any library I can get the 151 byte binary" but exactly these coreutils that are presented.
It gives a good idea if the replacements can actually be comparable replacements.
The second part is then to compare the number of features implemented and the behavior.
By default Rust statically links the Rust runtime library, so yeah, 2 MB is roughly it. You can get around it by linking dynamically, and uutils also supports busybox style single binary + links installation (that is, check argv[0] (or whatever the equivalent in Rust is) for which code to run).
Is there a real need for this project ? I mean, this project has been used for a long time, could it really be better ? Do you intend to replace GNU coreutils ?
Not to be mean, just a genuine question (and yes I read the Why section of the readme).
Multiple people have sought to rewrite coreutils, whether it be for fun, exercise, or out of frustration with the current utils. I know folk on 4chan's /g/ were rewriting them once, and the people at suckless are trying to do the same, only with a lot of options and features stripped out due to "bloat" (my word not theirs).
Your second question presumes that just because something is old means it is the best. Traditional things can indeed work out well, but challenging tradition is how progress is made. And you can't know if the coreutils can be improved if you don't try, right?
What is the primary danger of bugs in coreutils? Are there any setuid programs in there that could be used to elevate? Or is it the ability to exploit shell scripts using them? Or files that when you manually process them with coreutils install a trojan?
Honestly, I wouldn't mind a non-GNU alternative to their stuff that isn't either a) feature-bare like Busybox or b) tightly-coupled with other software like the BSD utilities.
Why non-GNU? To completely and thoroughly dispel the notion of "GNU/Linux".
Your last question seems reasonable, but the first is a classic tactic taken by that nerd subculture who loves to shit on things. This seems particularly true in light of your edit indicating you already know the answer.
I don't like this trend of new projects choosing weak free software licenses. It's as though they wish to ignore the history of the world they live in -- free software would not exist in the form it does today without copyleft. Copyleft is the one defense free software developers have against corporate monopolies and proprietary splitting. For some reason though, everyone who works in $NewLang seems to not care about this at all.
"Those who cannot remember the past are condemned to repeat it"
I'd like to think that in the future when "the open-source bubble" bursts and companies fall back to 70s-era mentality, developers would rediscover why free software is important and a new movement would would be born.
Sadly, I think hardware be permanently locked down by then. Perhaps by FCC-style law.
[1] https://news.ycombinator.com/item?id=11312918