Show HN: BitKeeper – Enterprise-ready version control, now open-source

bcantrill · on May 10, 2016

The grand irony is that Larry was one of the earliest advocates of open sourcing the operating system at Sun[1] -- and believed that by the time Sun finally collectively figured it out and made it happen (in 2005), it was a decade or more too late.[2] So on the one hand, you can view the story of BitKeeper with respect to open source as almost Greek in its tragic scope: every reason that Larry outlined for "sourceware"[3] for Sun applied just as much to BK as it did to SunOS -- with even the same technologist (Torvalds) leading the open source alternative! And you can say to BK and Larry now that it's "too late", just as Larry told Sun in 2005, but I also think this represents a forced dichotomy of "winners" and "losers." To the contrary, I would like to believe that the ongoing innovation in the illumos communities (SmartOS, OmniOS, etc.) proves that it's never too late to open source software -- that open source communities (like cities) can be small yet vibrant, serving a critical role to their constituencies. In an alternate universe, might we be running BK on SunOS instead of git on Linux? Sure -- but being able to run an open source BK on an open source illumos is also pretty great; the future of two innovative systems has been assured, even if it took a little longer than everyone might like.

So congratulations to Larry and crew -- and damn, were you ever right in 1993! ;)

[1] Seriously, read this: http://www.landley.net/history/mirror/unix/srcos.html

[2] The citation here is, in that greatest of all academic euphemisms, "Personal communication."

[3] "Sourceware" because [1] predates the term "open source"

luckydude · on May 10, 2016

Yeah this irony is not lost on me. But in both cases, the companies acted in self interest. Neither had the guts to walk away from their existing revenue stream. It's hard to say what would have happened.

It's been an interesting ride and if nothing else, BK was the inspiration for Git and Hg, that's a contribution to the field. And maybe, just maybe, people will look at the SCCS weave and realize that Tichy pulled the wool over our eyes. SCCS is profoundly better.

adharmad · on May 10, 2016

Thank you for contributing to the development and evangalizing of DVCS, directly (BK) and indirectly (the ideas and inspiration for git, hg).

peonicles · on May 10, 2016

It's probably fair to say the DVCS accelerated the growth of the entire software industry.

Was BitKeeper the first version control system to "think distributed" ?

atombender · on May 10, 2016

Sun's TeamWare [1] was probably the first real distributed version control system. It worked on top of SCCS. Larry McBoy, BitKeeper's creator, was involved in its development. I believe BitKeeper also uses parts of SCCS internally.

[1] https://en.wikipedia.org/wiki/Sun_WorkShop_TeamWare

luckydude · on May 10, 2016

We did a clean room reimplementation of SCCS and added a pile of extensions.

chris_wot · on May 10, 2016

So basically when Tridge "reverse engineered" BK, he basically reimplemented SCCS?

https://lwn.net/Articles/132938/

luckydude · on May 11, 2016

nope, he did a clone and pull. He did rsync or tar but had no awareness of the file format.

qwertyuiop924 · on May 10, 2016

NSE begat NSE-lite begat TeamWare begat BK begat git. Or so says Cantrill.

bcantrill · on May 10, 2016

That is my understanding, yes -- but the NSE and (McVoy-authored) NSElite chapters of the saga pre-date me at Sun. Before my "Fork Yeah!" talk[1][2], from which this is drawn, I confirmed this the best I could, but it was all based only on recollections of the engineers who were there (including Larry). I haven't found anything written down about (for example) NSElite, though I would love to get Larry on the record to formalize that important history...

[1] https://www.usenix.org/legacy/events/lisa11/tech/slides/cant...

[2] https://www.youtube.com/watch?v=-zRN7XLCRhc

luckydude · on May 11, 2016

The NSE was Suns attempt at a grand SCM system and it was miserably slow (single threaded fuse like COW file system implemented in user space). I did performance work back then, sort of a jack of all trades (filesystem, vm system, networking, you name it) so Sun asked me to look at it. I did and recoiled in horror, it wasn't well thought out for performance.

My buddies in the kernel group were actually starting to quit because they were forced to use the NSE and it made them dramatically less productive. Nerds hate being slowed down.

Once the whole SCM thing crossed my radar screen I was hooked. Someone had a design for how you could have two SCCS files with a common ancestry and they could be put back together. I wrote something called smoosh that basically zippered them together.

Nobody cared. So I looked harder at the NSE and realized it was SCCS under the covers. I built a pile of perl that gave birth to the clone/pull/push model (though I bundled all of that into one command called resync). It wasn't truly distributed in that the "protocol" was NFS, I just didn't do that part, but the model was the git model you are used to now minus changesets.

I made all that work with the NSE, you could bridge in and out and one by one the kernel guys gave up on NSE and moved to nselite. This was during the Solaris 5.0 bringup.

I still have the readme here: http://mcvoy.com/lm/nselite/README and here are some stats from the 2000th resync inside of Sun: http://mcvoy.com/lm/nselite/2000.txt

I was forced to stop developing nselite by the VP of the tools group because by this time Sun knew that nselite won and NSE lost so they ramped up a 8 person team to rewrite my perl in C++ (Evan later wrote a paper basically saying that was an awful idea). They took smoosh.c and never modified it, just stripped my history off (yeah, some bad blood).

Their stuff wasn't ready so I kept working but that made them look bad, one guy with some perl scripts outpacing 8 people with a supposedly better language. So their VP came over and said "Larry, this went all the way up to Scooter, if you do one more release you're fired" and set back SCM development almost a decade, that was ~1991 and I didn't start BitKeeper until 1998. There is no doubt in my mind that if they had left me alone they would have the first DVCS.

Fun times, I went off and did clusters in the hardware part of the company.

bcantrill · on May 11, 2016

Wow, jackpot -- thank you! That 2000th resync is a who's who of Sun's old guard; many great technologies have been invented and many terrific companies built by the folks on that list! I would love to see the nselite man pages that the README refers to (i.e., resync(1) and resolve(1)); do you happen to still have those?

Also, even if privately... you need to name that VP. ;)

muizelaar · on May 11, 2016

There used to be a paper on smoosh here: http://www.bitmover.com/lm/papers/smoosh.ps but it's gone now. Do you mind putting it back? I'd like to read it again.

luckydude · on May 11, 2016

http://www.mcvoy.com/lm/bitmover/lm/papers/

burfog · on May 11, 2016

You weren't the only one to do SCCS over NFS. The real-time computer division of Harris did it too. That version control system was already considered strange and old by 2004 when I encountered it.

qwertyuiop924 · on May 11, 2016

1 man using perl can outpace 8 on C++. Who would've thought? /sarcasm. But seriously, I think this is one of the classic instances of what is now quite common knowledge about dynamic scripting languages: They let you get things done MUCH faster. I think the tools group learned the wrong lesson from this, but OTOH, who would want to start developing all of their new software in perl? And given that python hadn't caught on yet, there really wasn't much else out there in the field.

ryao · on May 11, 2016

> there really wasn't much else out there in the field.

What about shell scripts?

qwertyuiop924 · on May 11, 2016

Shell scripts are even more miserable to write than perl, and are missing a lot of features you want for most applications

jschwartzi · on May 11, 2016

Or LISP.

qwertyuiop924 · on May 11, 2016

Firstly, the mention of LISP would have probably sent most of Sun screaming and running for the hills at the time. Secondly, LISP has never been very good at OS integration, one of the most important things for many software projects.

dtamhk · on May 13, 2016

>> Evan later wrote a paper basically saying that was an awful idea

Is this paper available online? Thanks.

luckydude · on May 14, 2016

https://www.usenix.org/legacy/publications/library/proceedin...

And for the record, Evan was somewhat justified in not saying I had anything to do with Teamware since I made his team look like idiots, ran circles around them. On the other hand, taking smoosh.c and removing my name from the history was dishonest and a douche move. Especially since not one person on that team was capable of rewriting it.

The fact remains that Teamware is just a productized version of NSElite which was written entirely by me.

If I sound grumpy, I am. Politics shouldn't mess with history but they always do.

qwertyuiop924 · on May 10, 2016

Good to know I (probably) got it right.

danielbot · on May 10, 2016

BK and Monotone begat Git

ajb · on May 11, 2016

There was a paper published on a DVCS using UUCP (!) in 1980: "A distributed version control system for wide area networks " B O Donovan, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=tru...

sinxoveretothex · on May 12, 2016

The date of publication on Xplore is September 1990 though?

ryao · on May 11, 2016

> Thank you for contributing to the development and evangalizing of DVCS, directly (BK) and indirectly (the ideas and inspiration for git, hg).

I concur.

rbanffy · on May 10, 2016

> the companies acted in self interest

Sun, could, at least, make a profit building workstations and servers and licensing chips.

It's actually very sad they don't build those SPARC desktops anymore.

nickpsecurity · on May 10, 2016

I really doubt it. There's not enough demand to make them competitive in price-performance. The server chips have stayed badass but still niche market. There's even open-source, SPARC HW with reference boards for sell from Gaisler. That didn't take off.

I liked the desktops but there's no money in them. Market always rejects it. So does FOSS despite it being the only open ISA with mainstream, high-performance implementations.

Annatar · on May 11, 2016

> There's not enough demand to make them competitive in price-performance.

There does not need to be demand: Steve Jobs (in)famously said that where there was no market, "create one".

I for one would absolutely love to be able to buy an illumos-powered A4-sized tablet which ran a SPARC V9 instruction set, plugged into a docking station, and worked with a wireless keyboard and mouse to be used as a workstation when I'm not walking around with my UNIX server in hand. Very much akin to Apple Computer's iPad Pro (or whatever they call it, I don't remember, nor is that really relevant).

But the most important point was, and still is, and always will be: it has to cost as much as the competition, or less. Sun Microsystems would just not accept that, no matter how much I tried to explain and reason with people there: "talk to the pricing committee". What does that even mean?!? Was the pricing committee composed of mute, deaf and blind people who were not capable of seeing that PC-buckets were eating Sun's lunch, or what?

nickpsecurity · on May 11, 2016

"There does not need to be demand: Steve Jobs (in)famously said that where there was no market, "create one"."

What people forget is that Steve Jobs was a repeated failure at doing that, got fired, did soul-searching, succeeded with NEXT, got acquired, and then started doing what you describe. Even he failed more than he succeeded at that stuff. A startup trying to one-off create a market just for a non-competitive chip is going to face the dreaded 90+% failure rate.

"But the most important point was, and still is, and always will be: it has to cost as much as the competition, or less."

That's why the high-security stuff never makes it. It takes at least 30% premium on average per component. I totally believe your words fell on deaf ears at Sun. I'd have bought SunBlades myself if I could afford them. I could afford nice PC's. So, I bought nice PC's. Amazing that echo chamber was so loud in there that they couldn't make that connection.

"I for one would absolutely love to be able to buy an illumos-powered A4-sized tablet which ran a SPARC V9 instruction set"

That's actually feasible given the one I promote is 4-core, 1+GHz embedded chip that should be low power on decent process node.

http://www.gaisler.com/index.php/products/processors/leon4?t...

The main issue is the ecosystem and components like browsers with JIT's that must be ported to SPARC. One company managed to port Android to MIPS but that was a lot of work. Such things could probably be done for SPARC as well. The trick is implementing the ASIC, implementing the product, porting critical software, and then charging enough to recover that but not more than competition whose work is already done for them. Tricky, tricky.

Raptor's Talos Workstation, if people buy it, will provide one model that this might happen. Could get ASIC's on 45-65nm really quick, use SMP given per-chip cost is $10-30, port Solaris/Linux w/ containers, put in a shitload of RAM, and sell it for $3,000-6,000 for VM-based use and development. It would still take thousands of units to recover cost. Might need government sales.

pjakma · on May 12, 2016

The problem is the number of people who are into Illumos and want a portable Unix server is insignificant. All products came with fixed overheads (e.g. cost of tooling to start production), which have to be divided over the likely customer base. Small customer base == each customer pays a bigger share of the fixed overheads.

Basically, you want something to suit you and a small number of other people, but you won't pay for the cost of having something that "tailor made". You will only pay for the high-volume, lower-cost, more general product. So... that's all you get.

ryao · on May 11, 2016

The problems were Sun's failure to recognize that cheap IBM PC clones would disrupt them like they disrupted mainframes and Sun not trying hard enough to overcome Wintel's network effect. Sun needed to die shrink old designs to get something that they could fabricate at low cost and compete on price. Such a thing would have canabalized Sun workstation sales, which might be why they never did it.

nickpsecurity · on May 11, 2016

Could be. It's gone now, though.

Annatar · on May 11, 2016

You again! (:-) You know that the VHDL code for UltraSPARC T1 and T2 has been open sourced? If I had enough knowledge about synthesizing code inside of an FPGA, I would be building my own SPARC-based servers like there is no tomorrow!

As long as the code for those processors remains free, and a license to implement a SPARC ISA compliant processor only costs $50, the SPARC will never really, truly be gone, especially not for those people capable of synthesizing their own FPGA's, or even building their own hardware.

Some people did exactly that, a while back. Too bad they didn't turn their designs into ready-to-buy servers.

nickpsecurity · on May 11, 2016

" You know that the VHDL code for UltraSPARC T1 and T2 has been open sourced?"

That was exciting. It could do well even on a 200MHz FPGA given threading performance. Then, use eASIC's Nextreme's to convert it to structured ASIC for better speed, power-usage, and security from dynamic attacks on FPGA. That it's Oracle and they're sue-happy concerns me. I'd read that "GPL" license very carefully just in case they tweaked it. If it's safe, then drop one of those badboys (yes, the T2) on the best node we can afford with key I/O. Can use microcontrollers on the board for the rest as they're dirt cheap. Same model as Raptor's as I explained in another comment.

Alternatively, use Gaisler as Leon3 is GPL and designed for customization. Simple, too. Leon4 is probably inexpensive compared to ARM, etc.

"SPARC ISA compliant processor only costs $50, the SPARC will never really, truly be gone, especially not for those people capable of synthesizing their own FPGA's,"

Yep.

"Too bad they didn't turn their designs into ready-to-buy servers."

Not quite a server but available and illustrates your point:

http://www.gaisler.com/index.php/products/systems/gr-rasta?t...

Btw, I found this accidentally while looking for a production version of OpenSPARC:

http://palms.ee.princeton.edu/node/381

rbanffy · on May 11, 2016

FPGA is not magic. SPARC implemented on FPGA will never be competitive with consumer-level x86's

nickpsecurity · on May 11, 2016

No, they are magic: arbitrary hardware designs run without the cost of chip fabrication. Two non-profit FPGA's, one for performance at 28nm & one for embedded at 28SLPnm, would totally address the custom hardware and subversion problem given we could just keep checking on that one. The PPC cores and soon Intel Xeons already show what a good CPU plus FPGA accleration w/ local memory can do for applications.

Yeah, buddy, they're like magic hardware. Even if they aren't ASIC-competitive for the best ASIC's. Still magic with a market share and diverse applications that shows it. :)

Annatar · on May 11, 2016

I thought that is clear? Apparently not...

FPGA's are cheap and good enough for prototyping; once one has a working VHDL / Verilog code, it's tapeout time.

ryao · on May 11, 2016

Security wise, a FPGA is superior to consumer level x86 processors. How do you backdoor a FPGA?

nickpsecurity · on May 11, 2016

In more ways than you'd know. They're already pre-backdoored like almost all other chips for debugging purposes in what's called Design for Test or scan chains or scan probes or something. Much hardware hacking involves getting to those suckers to see what chip is doing.

Now, for remote attacks, you can embed RF circuitry in them that listens to any of that. You can embed circuits that receive incoming command, then dump its SRAM contents. You might modify the I/O circuitry to recognize a trapdoor command that runs incoming data as privileged instructions. You can put a microcontroller in there connected to PCI to do the same for host PC attacks. I know, that would be first option but I was having too much fun with RTL and transistor level. :)

rbanffy · on May 11, 2016

> There's not enough demand to make them competitive in price-performance

High-end chips only need to compete with other high-end chips. And low-end SPARC will not take off now that x86 has taken over.

nickpsecurity · on May 11, 2016

You make that sound easy. The POWER and SPARC T/M chips are amazing. Yet, Intel Xeon still dominates to point that they can charge less and invest more. That's with one hell of a head start from Sun and IBM. The other's... Alpha, MIPS, and PA-RISC... folded in server markets with Itanium soon to follow.

You can't just compete directly in that market: you have to convince them yours is worth buying for less performance at higher price and watts. Itanium tried with reliability & security advantages. Failed. Fortunately, Dover is about to try with RISC-V combined with SAFE architecture (crash-safe.org) for embedded stuff. We'll see what happens there.

pjakma · on May 12, 2016

They do, sort of. Intel were using SPARC cores in at least one of their "Management Engine" devices in chipsets recently anyway. ;)

bch · on May 11, 2016

> realize that Tichy pulled the wool over our eyes.

ref: https://en.wikipedia.org/wiki/Revision_Control_System, https://en.wikipedia.org/wiki/Walter_F._Tichy

luckydude · on May 12, 2016

Yeah, that. I think he got a PhD for RCS and what he should have gotten is shown the door. RCS sucks, SCCS is far, far better.

Just as an example, RCS could have been faster than SCCS if they had stored at the top of the file the offset to where the tip revision starts. You read the first block, then seek past all the stuff you don't need, start reading where the tip is.

But RCS doesn't do that, it reads all the data until it gets to the tip. Which means it is reading as much data as SCCS but only has sorta good performance for the tip. SCCS is more compat and is as fast or faster for any rev.

And BK blows them both away, we lz4 compress everything which means we do less I/O.

RCS sucked but had good marketing. We're here to say that SCCS was a better design.

ryao · on May 11, 2016

> in both cases, the companies acted in self interest. Neither had the guts to walk away from their existing revenue stream.

Why does bitmover have the guts now?

riffraff · on May 11, 2016

he explained the move in another comment https://news.ycombinator.com/item?id=11668492

qwertyuiop924 · on May 10, 2016

I though SCCS had the same problems as RCS. What did it do differently?

luckydude · on May 10, 2016

RCS is patch based, the most recent version is kept in clear text and the previous version is stored as a reverse patch and so on back to the first version. So getting the most recent version could be fast (it isn't) but the farther back you go in history the more time it takes. And branches are even worse, you have to patch backwards to the branch point and then forwards to the tip of the branch.

SCCS is a "weave". The time to get the tip is the same as the time to get the first version or any version. The file format looks like

  ^AI 1
  this is the first line in the first version.
  ^AE 1

That's "insert in version 1" data "end of insert for version one".

Now lets say you added another line in version 2:

  ^AI 1
  this is the first line in the first version.
  ^AE 1
  ^AI 2
  this is the line that was added in the second version
  ^AE 2

So how do you get a particular version? You build up the set of versions that are in that version. In version 1, that's just "1", in version 2, that's "1, 2". So if you wanted to get version 1 you sweep through the file and print anything that's in your set. So you print the first line, get to the ^AI 2 and look to see if that's in your set, it isn't, so you skip until you get to the ^AE 2.

So any version is the same time. And that time is fast, the largest file in our source base is slib.c, 18K lines, checks out in 20 milliseconds.

nullnix · on May 11, 2016

I had... much too extensive experience both with SCCS weaves and with hacking them way back in the day; I even wrote something which sounds very like your smoosh, only I called it 'fuse'. However, I wrote 'fuse' as a side-effect of something else, 'fission', which split a shorter history out of an SCCS file by wholesale discarding of irrelevant, er, strands and of the history relating to them. I did this because the weave is utterly terrible as soon as you start recording anything which isn't plain text or which has many changes in each version, and we were recording multimegabyte binary files in it by uuencoding them first (yes, I know, the decision was made way above my pay grade by people who had no idea how terrible an idea it was).

Where RCS or indeed git would have handled this reasonably well (indeed the xdelta used for git packfiles would have eaten it for lunch with no trouble), in SCCS, or anything weave-based, it was an utter disaster. Every checkin doubled the number of weaves in the file, an exponential growth without end which soon led to multigigabyte files which xdelta could have represented as megabytes at most. Every one-byte addition or removal doubled up everything from that point on.

And here's where the terribleness of the 'every version takes the same time' decision becomes clear. In a version control system, you want the history of later versions (or of tips of branches) overwhelmingly often: anything that optimizes access time for things elsewhere in the history at the expense of this is the wrong decision.

When I left, years before someone more courageous than me transitioned the whole appalling mess to git, our largest file was 14GiB and took more than half an hour to check out.

The SCCS weave is terrible. (It's exactly as good a format as you'd expect for the time, since it is essentially an ed script with different characters. It was a sensible decision for back then, but we really should put the bloody thing out of its misery, and ours.)

qwertyuiop924 · on May 11, 2016

Huh. Now I wonder how BK resolved this.

nullnix · on May 11, 2016

Yeah. I suspect the answer is 'store all binary data in BAM', which then uses some different encoding for the binary stuff -- but that then makes my gittish soul wonder why not just use that encoding for everything. (It works for git packfiles... though 'git gc' on large repos is a total memory and CPU hog, one presumes that whatever delta encoding BAM uses is not.)

luckydude · on May 11, 2016

We support the uuencode horror for compat (and for smaller binaries that don't change) but the answer for binaries is BAM, there is no data in the weave for BAM files.

I don't agree that the weave is horrible, it's fantastic for text. Try git blame on a file in a repo with a lot of history then try the same thing in BK. Orders and orders of magnitude faster.

And go understand smerge.c and the weave lightbulb will come on.

nullnix · on May 11, 2016

Yeah, that's the problem; it's optimizing for the wrong thing. It speeds up blame at the expense of absolutely every other operation you ever need to carry out; the only thing which avoids reading (or, for checkins, writing) the whole file is a simple log. Blame is a relatively rare operation: its needs should not dominate the representation.

The fact that the largest file you mention is frankly tiny shows why your performance was good: we had ~50,000 line text files (yeah, I know, damn copy-and-paste coders) with a thousand-odd revisions and a resulting SCCS filesize exceeding three million lines, and every one of those lines had to be read on every checkout: dozens to hundreds of megabytes, and of course the cache would hardly ever be hot where that much data was concerned, so it all had to come off the disk and/or across NFS, taking tens of seconds or more in many cases. RCS could have avoided reading all but 50,000 of them in the common case of checkouts of most recent changes. (git would have reduced read volume even more because although it is deltified the chains are of finite length, unlike the weave, and all the data is compressed.)

luckydude · on May 12, 2016

Give me a file that was slow and lets see how it is in BitKeeper. I bet you'll be impressed.

50K lines is not even 3x bigger than the file I mentioned. Which we check out in 20 milliseconds.

As for optimizing blame, you are missing the point, it's not blame, it's merge, it's copy by reference rather than copy by value.

nullnix · on May 12, 2016

I'd do that if I was still working there. I can probably still get hold of a horror case but it'll take negotiation :)

(And yes, optimizing merge matters too, indeed it was a huge part of git's raison d'etre -- but, again, one usually merges with the stuff at the tip of tree: merging against something you did five years ago is rare, even if it's at a branch tip, and even rarer otherwise. Having to rewrite all the unmodified ancient stuff in the weave merely because of a merge at the tip seems wrong.)

(Now I'm tempted to go and import the Linux kernel or all of the GCC SVN repo into SCCS just to see how big the largest weave is. I have clearly gone insane from the summer heat. Stop me before I ci again!)

luckydude · on May 12, 2016

Our busiest file is 400K checked out and about 1MB for the history file lz4 compressed. Uncompressed is 2.2M and the weave is 1.7M of that.

Doesn't seem bad to me. The weave is big for binaries, we imported 20 years of Solaris stuff once and the history was 1.1x the size of the checked out files.

caf · on May 11, 2016

Presumably if you then delete that first line in the third version, you get something like

  ^AI 1
  this is the first line in the first version.
  ^AE 1
  ^AD 3
  ^AI 2
  this is the line that was added in the second version
  ^AE 2

?

luckydude · on May 11, 2016

Close. By the way there is a bk _scat command (sccs cat, not poop) that dumps the ascii file format so you can try this and see.

The delete needs to be an envelope around the insert so you get

  ^AD 3
  ^AI 1
  this is the first line in the first version.
  ^AE 1
  ^AE 3
  ^AI 2
  this is the line that was added in the second version
  ^AE 2

That whole weave thing is really cool. The only person outside of BK land that got it was Braam Cohen in Codeville, I think he had a weave.

caf · on May 11, 2016

It's sort of suprising then that a delete doesn't just and end-version on the insert instead:

  ^AI 1..2
  this is the first line in the first version.
  ^AE 1..2
  ^AI 2
  this is the line that was added in the second version
  ^AE 2

This way the reconstruction process wouldn't need to track blocks-within-blocks.

rbsmith · on May 11, 2016

Interesting. "^AI Spec" where Spec feeds into a predicate f(Spec, Version) to control printing a particular Version? Looks like you could drop the ^AE lines.

ob · on May 11, 2016

Sounds like equivalent representations no? Limit the scope of the I lines or wrap them in D lines.

tmd83 · on May 11, 2016

Probably missing something. Both are working on one file at a time and have some form of changeset. One is adding from back to forward (kind of) another is from forward to back (rcs). Not sure where the reduction of work coming from.

retr0h · on May 11, 2016

SUN must like the scat names :) I used to use scat tool for debugging core files.

qwertyuiop924 · on May 10, 2016

That actually is pretty neat

clacke2 · on May 10, 2016

Aha, so that's where bzr got it from. :-)

luckydude · on May 10, 2016

bzr got more than that from BK, it got one of my favorite things, per-file checkin comments. I liken those to regression tests, when you start out you don't really value them but over time the value builds up. The fact that Git doesn't have them bugs me to no end. BZR was smart enough to copy that feature and that's why MySQL choose bzr when they left BK.

The thing bzr didn't care about, sadly, is performance. An engineer at Intel once said to me, firmly, "Performance is a feature".

qwertyuiop924 · on May 10, 2016

Git's attitude, AFAIK, is that if you want per-file comments, make each file its own checkin. There are pros and cons to this.

Performance as a feature, OTOH, is one of Linus's three tenets of VCS. To quote him, "If you aren't distributed, you're not worth using. If you're not fast, you're not worth using. And if you can't guarantee that the bits I get out are the exact same bits I put in, you're not worth using."

ngcazz · on May 10, 2016

Big fan of `git commit -vp` here. Enables me to separate the commits according to concerns.

drewnoakes · on May 10, 2016

I suppose that in Git if you wanted to group a bunch of these commits together you could do so with a merge commit.

lukaslalinsky · on May 11, 2016

If I remeber the history correctly, per-file commit messages were actually a feature that was quickly hacked in to get MySQL on board. It did not have that before those MySQL talks and I don't think it was very popular after.

Performance indeed killed bzr. Git was good enough and much faster, so people just got used to its weirdness.

Annatar · on May 11, 2016

> Git was good enough and much faster, so people just got used to its weirdness.

And boy is git weird! In Mercurial, I can mess with the file all day long after scheduling it for a commit, but one can forget that in git: marking a file for addition actually snapshots a file at addition time, and I have read that that is actually considered a feature. It's like I already committed the file, except that I didn't. This is the #1 reason why I haven't migrated from Mercurial to git yet, and now with Bitkeeper free and open source, chances are good I never will have to move to git. W00t!!!

I just do not get it... what exactly does snapshotting a file before a commit buy me?

jschwartzi · on May 11, 2016

It's probably the same idea as the one behind committing once in Mercurial and then using commit --amend repeatedly as you refine the changes. Git's method sounds like it avoids a pitfall in that method by holding your last changeset in a special area rather than dumping it into a commit so that you can't accidentally push it.

I often amend my latest commit as a way to build a set of changes without losing my latest functional change.

Annatar · on May 12, 2016

I always do a hg diff before I commit. If in spite of that I still screw up, I do a hg rollback, and if I already pushed, I either roll back on all the nodes, or I open a bug, and simply commit a bug fix with a reference to the bug in the bug tracking system. I've been using Mercurial since 2007 and I've yet to use --amend.

krinchan · on May 11, 2016

> In Mercurial, I can mess with the file all day long after scheduling it for a commit

OTOH, I find that behavior weird as I regularly add files to the index as I work. If a test breaks and I fix it, I can review the changes via git diff (compares the index to the working copy) and then the changes in total via git diff HEAD (compares the HEAD commit to the working copy).

lukaslalinsky · on May 11, 2016

Did you know you can do 'git add -N'? That will actually just schedule the file to be added, but won't snapshot it.

clacke2 · on May 10, 2016

Cool. I've used bzr but never knew about per-file comments.

skew4 · on May 11, 2016

10 or even 20% performance is not a feature. But when tools or features get a few times faster or more then their usage model changes - which means they become different features.

pklausler · on May 10, 2016

In short, RCS maintains a clean copy of the head revision, and a set of reverse patches to be applied to recreate older revisions. SCCS maintains a sequence of blocks of lines that were added or deleted at the same time, and any revision can be extracted in the same amount of time by scanning the blocks and retaining those that are pertinent.

Really old school revision control systems, like CDC's MODIFY and Cray's clone UPDATE, were kind of like SCCS. Each line (actually card image!) was tagged with the ids of the mods that created and (if no longer active) deleted it.

rbsmith · on May 11, 2016

| CDC's MODIFY and Cray's clone UPDATE, were kind of like SCCS

Do you have references? I've heard of these but haven't come across details after much creative searching since they are common words.

pmcjones · on May 11, 2016

See http://www.bitsavers.org/pdf/cdc/cyber/software/ .

rbsmith · on May 11, 2016

Thank you! A peek into the (as far as I know) into root node of source control history.

luckydude · on May 11, 2016

I've heard that too. It comes from card readers somehow.

SwellJoe · on May 10, 2016

I've read that "sourceware" article before, in the distant past when it was still a roughly accurate picture of the market (maybe 1995 or 1996). It's weird to read it again now, in a world that is so remarkably changed. Linux, the scrappy little upstart with a million or so users at the time of the paper, is now the most popular OS (or at least kernel) on the planet, powering billions of phones and servers. NT was viewed as the unfortunate but inevitable future of server operating systems.

I remember looking at IT jobs back then, and seeing a business world covered in Windows NT machines; I even got my MCSE (alongside some UNIX certifications that I was more excited about), because of it. Looking at jobs now, the difference is remarkable, to say the least. Nearly every core technology a system administrator needs to know is Open Source and almost certainly running on Linux.

And, the funny thing is that the general prescription (make a great Open Source UNIX) is exactly what it took to save UNIX. It just didn't involve any of the big UNIX vendors in a significant way (the ones spending a gazillion dollars on UNIX development at the time). Linux got better faster than Sun got smarter, and ate everybody's lunch, including Microsoft. Innovator's Dilemma strikes again.

Apple is an interesting blip on the UNIX history radar, too...though, they're likely to lose to the same market forces in the end, as phones become commodities. I'm a bit concerned that it's going to be Android, however, that wins the mobile world since Android is nowhere near the ideal OS from an Open Source and ethical perspective; but, I guess they got the bits right that Larry was suggesting needed to be right.

Anyway, it was a weird flashback to read that article. Things change, and on a scale that seems slow, until you look back on it, and see it's "only" been a couple of decades. In the grand scheme of things, and compared to the motion of technology prior to the 1900, that's a blink of an eye.

qwertyuiop924 · on May 10, 2016

Eh. Linux does have a LOT of problems. Well, so does everything, but it's not like it's "great." More like "good."

But yeah, it was weird that everybody thought NT was going to be the future. And now, MS has opensourced a good deal of infrastructure, is working with Node, has announced an integrated POSIX environment for Windows. And since it's in corporate, it might even be able to fix the fork(2) performance problems.

SwellJoe · on May 10, 2016

Great is a relative term. But, can you name an OS, especially a UNIX, that is better in the general case? By "general case", I mean, good for just about anything, even if there's something better for some niche or role. Also, take into account the world we live in: More computing happens on servers and phones than on desktops and laptops; and judge the OS based on how it's doing in those roles.

I sincerely consider Linux a great UNIX. Probably the best UNIX that's existed, thus far. There are warts, sure. Technically, Solaris had (and still has, in IllumOS and SmartOS) a small handful of superior features and capabilities (at this point one can list them on one hand, and one could also list some "almost-there" similar techs on Linux). But, I assume you've used Solaris (or some other commercial UNIX) enough to have an opinion...can you honestly say you enjoyed working on it more than Linux? The userspace on Solaris was always drastically worse than Linux, unless you installed a ton of GNU utilities, a better desktop, etc. But, Linux brought us a UNIX we could realistically use anywhere, and at a price anyone could afford. That's a miracle for a kid that grew up lusting after an Amiga 3000UX (because it was the closest thing to an SGI Indy I could imagine being able to afford).

qwertyuiop924 · on May 10, 2016

Fair 'nuff. And no, I haven't used commercial UNIXes all that much, but I have experienced plenty of Linux's warts. I do agree with a lot of those points, but containers on Linux just aren't there, systemd is a mess that's going to get pushed in no matter what we say, and there are plenty of issues to be had, although the ladder is true of any UNIX. If you want to know what the rest of the issues are, just start googling. And while you're at it, listen to some of Bryan Cantrill's talks. They are biased (obviously), but they're entertaining, and they do point out some things that I think are real problems (posix conformance (MADV_DONTNEED), and epoll semantics, mainly).

Oh, and don't flame me for speaking in ignorance. I've been a Linux user for half a decade at least now, and I CAN say I see problems with it. I can also say, as a person who is programmer, that some of the things that Cantrill pointed out are actually evil. Note, however, that I don't claim Solaris, or any other OS is better. Every UNIX is utterly fucked in some respect. I just know Linux's flaws the best.

By the way, I've been trying to get Amiga emulation working for a while. It basically works at this point, but the *UAEs are a misery on UNIX systems. Without any kind of loader, you have to spend 10 minutes editing the config every time you want to play a game. But if you're in any way interested in the history you lived through, check out youtube.com/watch?v=Tv6aJRGpz_A

SwellJoe · on May 10, 2016

Those issues seem so trivial with the benefit of hindsight and a memory of what it was like to deploy an application to multiple UNIX variants. Having one standard (we can call that standard "it ain't quite POSIX, but it runs great on Linux") is so superior to the mine field that was all of the UNIXen in 199x, that I don't even register it as a problem. Shoot, until you've had to use autotools or custom build a makefile for a half dozen different C compilers, kernels, libc, and so on, you don't know from POSIX "standards" pain.

But now my beard is showing and I'm ranting. My point is this: it took something from completely outside of the commercial UNIX ecosystem, so far out in left field that none of the UNIX bosses (or Microsoft) saw it as a threat until it was far too late...and it took something that was good, really good in at least some regards, that it would have passionate fans even very early in. Linux did that. And, compared to everything else (pretty much everything else that's ever existed, IMHO), it's great.

And, I'm on board the retro computing bandwagon. I have a real live Commodore 64 and an Atari 130xe. I'd like to one day find an Amiga 1200 in good shape, but because I live in an RV and travel fulltime, I don't have a lot of room to spare. But I do like to tinker and reminisce.

peatmoss · on May 11, 2016

> until you've had to use autotools or custom build a makefile for a half dozen different C compilers, kernels, libc, and so on, you don't know from POSIX "standards" pain.

Truer words have ne'er been spoken. My first big boy job involved building and maintaining a large open source stack on top of AIX. These days I occasionally experience hiccups related to OpenBSD not being Linux. Problems aren't even in the same league. That said, the thrill of getting stuff to work on AIX was certainly greater (and purchased with more human suffering).

qwertyuiop924 · on May 11, 2016

Ah, the agony that I can only imagine. The many Linux distros are bad enough...

qwertyuiop924 · on May 11, 2016

You know, I think you're right. You have a good point. Thanks.

And I'd love to have some real retro computers, but I've got no money, and most of the really interesting ones are from the UK. Ah well...

Annatar · on May 11, 2016

> Great is a relative term. But, can you name an OS, especially a UNIX, that is better in the general case? By "general case", I mean, good for just about anything, even if there's something better for some niche or role. Also, take into account the world we live in: More computing happens on servers and phones than on desktops and laptops; and judge the OS based on how it's doing in those roles.

Okay then, SmartOS. Why is an exercise left for the reader, because it would just take too much and too long to list and explain all the things it does better, faster and cheaper than Linux in server space; that's material rife for an entire book.

> can you honestly say you enjoyed working on it more than Linux?

Enjoyed it?!? Love it, I love working with Solaris 10 and SmartOS! It's such a joy not having a broken OS which actually does what it is supposed to do (run fast, be efficient, protect my data, is designed to be correct). When I am forced to work with Linux (which I am, at work, 100% of the time, and I hate it), it feels like I am on an operating system from the past century: ext3 / ext4 (no XFS for us yet, and even that is ancient compared to ZFS!), memory overcommit, data corruption, no backward compatibility, navigating the minefield of GNU libc and userland misfeatures and "enhancements". It's horrible. I hate it.

> The userspace on Solaris was always drastically worse than Linux,

Are you kidding me? System V is great, it's grep -R and tar -z that I hate, because it only works on GNU! Horrid!!!

> But, Linux brought us a UNIX we could realistically use anywhere, and at a price anyone could afford.

You do realize that if you take an illumos derived OS like SmartOS and Linux, and run the same workload on the same cheap intel hardware, SmartOS is usually going to be faster, and if you are virtualizing, more efficient too, because it uses zones, right? Right?

It's like this: when I run SmartOS, it's like I'm gliding around in an ultramodern, powerful, economical mazda6 diesel (the 175 HP / 6 speed Euro sportwagon version); I slam the gas pedal and I'm doing 220 km/h without even feeling it and look, I'm in Salzburg already! When I'm on Linux, I'm in that idiotic Prius abomination again: not only do I not have any power, but I end up using more fuel too, even though it's a hybrid, and I'm on I-80 somewhere in Iowa. That's how I'd compare SmartOS to Linux.

vidoc · on May 11, 2016

> It's like this: when I run SmartOS, it's like I'm gliding around in an ultramodern, powerful, economical mazda6 diesel (the 175 HP / 6 speed Euro sportwagon version);

"Tout ce qui est excessif est insignifiant"

Annatar · on May 12, 2016

"Yeah Tenzin, I... still don't speak that."

luckydude · on May 11, 2016

That was an awesome rant. You must work with Brian.

Edit: Bryan Cantrill, spelled it wrong.

Annatar · on May 12, 2016

Nothing would make me happier professionally than to have the opportunity to work with Bryan (sadly, we've never met, although I did work in Silicon Valley for a while). For instance, those times when I wouldn't be writing C, I could finally have an orgy of AWK one-liners and somebody would appreciate it without me having to defend why I used AWK.

SwellJoe · on May 11, 2016

I'm struck by how much this sounds like a Linux fan ranting back in 1995, when Windows and "real" UNIX was king. The underdog rants were rampant back then (I'm sure I penned a few of them myself).

I think the assumption you're making is that people choose Linux out of ignorance (and, I think the ignorance goes both ways; folks using Solaris have been so accustomed to Zones, ZFS, and dtrace being the unique characteristic of Solaris for so long that they aren't aware of Linux' progress in all of those areas). But, there are folks who know Solaris (and its children) who still choose Linux based on its merits. We support zones in our products/projects (because Sun paid for the support, and Joyent supported us in making Solaris-related enhancements), and until a few years ago it was, hands-down, the best container technology going.

Linux has a reasonable container story now; the fact that you don't like how some people are using it (I think Docker is a mess, and I assume you agree) doesn't mean Linux doesn't have the technology for doing it well built in. LXC can be used extremely similarly to Zones, and there's a wide variety of tools out there to make it easy to manage (I work on a GUI that treats Zones and LXC very similarly, and you can do roughly the same things in the same ways).

"Are you kidding me? System V is great, it's grep -R and tar -z that I hate, because it only works on GNU! Horrid!!!"

Are you really complaining about being able to gzip and tar something in one command? Is that a thing that's actually happening in this conversation?

I'll just say I've never sat down at a production Sun system that didn't already have the GNU utilities installed by some prior administrator. It's been a while since I've sat down at a Sun system, but it was standard practice in the 90s to install GNU from the get go. Free compiler that worked on every OS and for building everything? Hell yes. Better grep? Sure, bring it. People went out of their way to install GNU because it was better than the system standard, and because it opened doors to a whole world of free, source-available, software.

"You do realize that if you take an illumos derived OS like SmartOS and Linux, and run the same workload on the same cheap intel hardware, SmartOS is usually going to be faster"

Citation needed. Some workloads will be faster on SmartOS. Others will be faster on Linux. Given that almost everything is developed and deployed on Linux first and most frequently, I wouldn't be surprised to see Linux perform better in the majority of cases; but, I doubt it's more than a few percent difference in any common case. The cost of having or training staff to handle two operating systems (because you're going to have to have some Linux boxes, no matter what) probably outweighs buying an extra server or two.

"and if you are virtualizing, more efficient too, because it uses zones, right? Right?"

Citation needed, again. Zones are great. I like Zones a lot. But, Linux has containers; LXC is not virtualization, it is a container, just like Zones. Zones has some smarts for interacting with ZFS filesystems and that's cool and all, but a lot of the same capabilities exist with LVS and LXC.

I feel like you're arguing against a straw man in a lot of cases here.

Why do you believe LXC (or other namespace-based containers on Linux) are inherently inefficient, compared to Zones, which uses a very similar technique to implement?

And, it's not Linux' fault the systems you manage are stuck on ext4. There are other filesystems for Linux; XFS+LVM is great. A little more complex to manage than ZFS, but not by a horrifying amount. So, you have to read two manpages instead of one. Not a big deal. And, there's valid reasons the volume management and filesystem features are kept independent in the kernel (I dunno if you remember the discussions about ZFS inclusion in Linux; separate VM and FS was a decision made many years ago, based on a lot of discussion). Almost any filesystem on Linux has LVM, so, filesystems on Linux get snapshots and tons of other features practically for free. That's pretty neat.

Anyway, I think SmartOS is cool. I tinker with it every now and then, and have even considered doing something serious with it. But, I just don't find it compellingly superior to Linux. Certainly not enough to give up all of the benefits Linux provides that SmartOS does not (better package management, vast ecosystem and community, better userland even now, vastly better hardware support even on servers, etc.).

luckydude · on May 12, 2016

I predate the Solaris stuff, I'm not a fan. I liked SunOS which was a bugfixed and enhanced BSD. When I wrote the sourceos paper I was talking about SunOS. (I lied, I overlapped with Solaris but I try and block that out)

All that said, Sun had an ethos of caring. In the early days of bitmover amy had some quote about the sun man pages versus the linux man pages, if someone can find that, it's awesome. We keep a sun machine in our cluster just so we can go read sane man pages about sed or ed or awk or whatever. Linux man pages suck.

Sun got shoved into having to care about System V and it sucked. I hated it and left, so did a bunch of other people. But Sun carried on and the ethos of caring carried on and Bryan and crew were a big part of that. My _guess_ is that Solaris and its follow ons are actually pleasant. I'll be pissed if I install it and it doesn't have all the GNU goodness. If that's the case then you are right, they don't get it.

What I expect to see is goodness plus careful curating. That's the Sun way.

SwellJoe · on May 12, 2016

I agree that GNU man pages suck. At least the atrocity that was "this man page is a stub, use info for the real docs" is gone now (I don't know if GNU stopped trying to force me to use info, or if distros fix it downstream). I have always hated info and the persistent nagging that GNU docs used to try to make people use it.

SmartOS is nice. I've always thought so and I have a lot of respect for the folks working on it. But, it isn't nice enough to overcome the negatives of being a tiny niche system. Linux has orders of magnitude more people working on it (and many of those people are also very smart). That's hard to beat.

JdeBP · on May 12, 2016

On the subject of manual pages, a few days ago: https://news.ycombinator.com/item?id=11643347

mwcampbell · on May 11, 2016

So, why isn't any major hosting provider offering a multi-tenant Linux container hosting service directly on bare-metal Linux servers, whereas Joyent is providing Docker-compatible container hosting on SmartOS using LX-branded zones? Does anyone trust the security of Linux namespaces and cgroups in a multi-tenant environment? That's the one thing that SmartOS really seems to have going for it.

qwertyuiop924 · on May 20, 2016

I's not just a matter of trust: namespaces and cgroups break like graham crackers. But you knew that already. ;)

Annatar · on May 12, 2016

> Citation needed, again. Zones are great. I like Zones a lot. But, Linux has containers; LXC is not virtualization, it is a container, just like Zones. Zones has some smarts for interacting with ZFS filesystems and that's cool and all, but a lot of the same capabilities exist with LVS and LXC.

How about simple logic instead? I know zones work, because they have been in use in the enterprises since 2006, and they are easy to work with and reason about; if I have the same body of software available on a system with the original lightweight virtualization as I do on Linux, and my goal is data integrity, self-healing, and operational stability, what is my incentive to running a conceptual knock-off copy of zones, LXC? To me, the choice is obvious: design the solution on top of the tried and tested, original substrate, rather than use a knock-off, especially since the acquisition cost of both is zero, and I already know from experience that investing in zones pays profit and dividends down the road, because I ran them before in production environments. I like profits, and the only thing I like better than engineering profits are engineering profits with dividends. That, and sleeping through my nights without being pulled into emergency conference calls about some idiotic priority 1 incident. Incident which could have easily been avoided altogether, if I had been running on SmartOS with ZFS and zones. Based on multiple true stories, and don't even get me started on the dismal redhat "support", where redhat support often ends up in a shootout with customers[1], rather than fixing customer's problems, or being honest and admitting they do not have a clue what is broken where, nor how to fix it.

> And, it's not Linux' fault the systems you manage are stuck on ext4. There are other filesystems for Linux; XFS+LVM is great.

Did you know that LVM is an incomplete knock-off of HP-UX's LVM, which in turn is a licensed fork of Veritas' VxVM? Again, why would I waste my precious time, and run up financial engineering costs running a knock-off, when I can just run SmartOS and have ZFS built in? The logic does not check out, and financial aspects even less so.

On top of that, did you know that not all versions of the Linux kernel provide LVM write barrier support? And did you know that not all versions of the Linux kernel provide XFS write barrier support (XFS at least will report that, while LVM will do nothing and lie that the I/O made it to stable storage, when it might still be in transit)? And did you know that to have both XFS and LVM support write barriers, one needs a particular kernel version, which is not supported in all versions of RHEL? And did you know that not all versions of LVM correctly support mirroring, and that for versions which do not require a separate logging device, the log is in memory, so if the kernel crashes, one experiences data corruption? And did you know that XFS, as awesome as it is, does not provide data integrity checksums?

And we haven't even touched upon systemd knock-off of SMF, nor have we touched upon lack of fault management architecture, nor have we touched upon how insane bonding of interfaces is in Linux, nor have we touched upon how easy it is to create virtual switches, routers and aggregations (trunks in CISCO parlance) using Crossbow in Solaris/illumos/SmartOS... when I wrote that there is enough material for a book, I was not trying to be funny.

[1] http://bugzilla.redhat.com/

JdeBP · on May 12, 2016

The Linux "knock-off" of SMF is not systemd. It is SystemXVI. Roughly speaking.

* https://news.ycombinator.com/item?id=10212770

* https://github.com/ServiceManager/ServiceManager/blob/master...

Annatar · on May 12, 2016

> I'm struck by how much this sounds like a Linux fan ranting back in 1995, when Windows and "real" UNIX was king. The underdog rants were rampant back then (I'm sure I penned a few of them myself).

It sounds like a Linux fan ranting circa 1995 because that is precisely what it is: first came the rants. Then a small, underdog company named "redhat" started providing regular builds and support, while Linux was easily accessible, and subversively smuggled into enterprises. Almost 20 years later, Linux is now everywhere.

Where once there was Linux, there is now SmartOS; where once there was redhat, there is now Joyent. Where once one had to download and install Linux to run it, one now has but to plug in an USB stick, or boot SmartOS off of the network, without installing anything. Recognize the patterns?

One thing is different: while Linux has not matured yet, as evidenced, for example, by GNU libc, or by GNU binutils, or the startup subsystem preturbations, SmartOS is based on a 37 years old code base which has matured and reached operational stability about 15 years ago. The engineering required for running the code base in the biggest enterprises and government organizations has been conditioned by large and very large customers having problems running massive, mission critical infrastructure. That is why for instance there are extensive, comprehensive post-mortem analysis as well as debugging tools, and the mentality permeates the system design: for example, ctfconvert runs on every single binary and injects the source code and extra debugging information during the build; no performance penalty, but if you are running massive real-time trading, a database or a web cloud, when going gets tough, one appreciates having the tools and the telemetry. For Linux that level of system introspection is utter science fiction, 20 years later, in enterprise environments, in spite of attempts to the contrary. (Try Systemtap or DTrace on Linux; Try doing a post-mortem debug on the the kernel, or landing into a crashed kernel, inspecting system state, patching it on the fly, and continuing execution; go ahead. I'll wait.) All that engineering that went into Solaris and then illumos and now SmartOS has passed the worst trials by fire at biggest enterprises, and I should know, because I was there, at ground zero, and lived through it all.

All that hard, up-front engineering work that was put into it since the early '90's is now paying off, with a big fat dividend on top of the profits: it is trivial to pull down a pre-made image with imgadm(1M), feed a .JSON file to vmadm(1M), and have a fully working yet completely isolated UNIX server running at the speed of bare metal, in 25 seconds or less. Also, let us not forget almost ~14,000 software packages available, most of which are the exact same software available on Linux[1]. If writing shell code and the command line isn't your cup of tea, there is always Joyent's free, open source SmartDC web application for running the entire cloud from a GUI.

Therefore, my hope is that it will take less than 18 years that it took Linux for SmartOS to become king, especially since cloud is the new reality, and SmartOS has been designed from the ground up to power massive cloud infrastructure.

> I think the assumption you're making is that people choose Linux out of ignorance

That is not an assumption, but rather my very painful and frustrating experience for the last 20 years. Most of those would-be system administrators came from Windows and lack the mentoring and UNIX insights.

> (and, I think the ignorance goes both ways; folks using Solaris have been so accustomed to Zones, ZFS, and dtrace being the unique characteristic of Solaris for so long that they aren't aware of Linux' progress in all of those areas).

I actually did lots and lots of system engineering on Linux (RHEL and CentOS, to be precise) and I am acutely aware of the limitations when compared to what Solaris based operating systems like SmartOS can do: not even the latest and greatest CentOS nor RHEL can even guarantee me basic data integrity, let alone backwards compatibility. Were we in the '80's right now, I would be understanding, but if after 20 years a massive, massive army of would-be developers is incapable of getting the basic things like data integrity, scheduling, startup/shutdown or init subsystem working correctly, in the 21st century, I have zero understanding and zero mercy. After all, my time as a programmer and as an engineer is valuable, and there is also financial cost involved, that not being negligible either.

> Linux has a reasonable container story now; the fact that you don't like how some people are using it (I think Docker is a mess, and I assume you agree)

Yes, I agree. The way I see it, and I've deployed very large datacenters where the focus was operational stability and data correctness, Docker is a web 2.0 developer's attempt to solve those problems, and they are flapping. Dumping files into pre-made images did not compensate for lack of experience in lifecycle management, or lack of experience in process design. No technology can compensate for lack of a good process, and good process requires experience working in very large datacenters where operational stability and data integrity are primary goals. Working in the financial industry where tons of money are at stake by the second can be incredibly instructive and insightful when it comes to designing operationally correct, data-protecting, highly available and secure cloud based applications, but the other way around does not hold.

> Are you really complaining about being able to gzip and tar something in one command? Is that a thing that's actually happening in this conversation?

Let's talk system engineering:

gzip -dc archive.tar.gz | tar xf -

will work everywhere; I do not have to think whether I am on GNU/Linux, or HP-UX, or Solaris, or SmartOS, and if I have the above non-GNU invocation in my code, I can guarantee you, in writing, that it will work everywhere without modification. If on the other hand I use:

tar xzf archive.tar.gz

I cannot guarantee that it will work on every UNIX-like system, and I know from experience I would have to fix the code to use the first method. Therefore, only one of these methods is correct and portable, and the other one is a really bad idea. If I understand this, then why do I need GNU? I do not need it, nor do I want it. Except for a few very specific cases like GNU Make, GNU tools are actually a liability. This is on GNU/Linux, to wit:

  % gcc -g hello.c -o hello
  % gdb hello hello.c
  GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-45.el5)
  ...
  ...
  ... 
  Dwarf Error: wrong version in compilation unit header (is 4, should be 2) [in module /home/user/hello]
  "/home/user/hello.c" is not a core dump: File format not recognized

  (gdb)

Now, why did that happen? Because the debugger as delivered by the OS doesn't know what to do with it. Something like that is unimaginable on illumos, and by extension, SmartOS. illumos engineers would rather drop dead, than cause something like this to happen.

On top of that, on HP-UX and Solaris I have POSIX tools, so for example POSIX extended regular expressions are guaranteed to work, and the behavior of POSIX-compliant tools is well documented, well understood, and guaranteed. When one is engineering a system, especially a large distributed system which must provide data integrity and operational stability, such concerns become paramount, not to mention that the non-GNU approach is cheaper because no code must be fixed afterwards.

[1] http://www.perkin.org.uk/posts/building-packages-at-scale.ht...

qwertyuiop924 · on May 12, 2016

So it seems that my innocent suggestion that Linux isn't perfect may have spawned a tiny... massive holy war. Great. You know what? SmartOS is fantastic. It's great. But Linux isn't terrible. They both have their flaws. Like SmartOS not having a large binary footprint, and not having the excellent package repositories. And the fact that every partisan of one hates /proc on the other. And the fact that KVM is from linux. And that the docker image is actually a pretty good idea. Or the fact that SMF and lauchd were some of the inspirations for systemd. Okay, now I'm just tossing fuel on the fire, by the bucketload.

Personally, I run linux on my desktop. Insane, I know, but I can't afford mac, and the OSX posix environment just keeps getting worse. Jeez. At this rate, Cygwin and the forthcoming POSIX environment from MS will be better. But anyways, I'm not switching my system to BSD or Illumos anytime soon, despite thinking that they are Pretty Cool (TM). Why? The binary footprint. Mostly Steam. Okay. Pretty much just Steam. Insane, I know, but I'm not running (just) a server.

So in summary, all software sucks, and some may suck more than others, but I'm not gonna care until Illumos and BSD get some love from NVIDIA.

And why do you care what a crazy person thinks? Oh, you don't. By all means continue the holy war. Grab some performance statistics, and hook in the BSDs. I'll be over heating up the popcorn...

Annatar · on May 12, 2016

> Like SmartOS not having a large binary footprint, and not having the excellent package repositories.

http://www.perkin.org.uk/posts/building-packages-at-scale.ht...

> And the fact that KVM is from linux.

Actually that's great that it's from Linux, because one major point of embarrassment for Linux is that KVM runs faster and better on SmartOS than it does on Linux, because Joyent engineers systematically used DTrace during the porting effort:

https://www.youtube.com/watch?v=cwAfJywzk8o

> Insane, I know, but I can't afford mac

Once you're able to afford one, you won't care about the desktop every again, because your desktop will JustWork(SM).

> but I'm not gonna care until Illumos and BSD get some love from NVIDIA.

NVIDIA provides regular driver updates for both Solaris and BSD. I bought a NVidia GX980TX, downloaded the latest SVR4 package for my Solaris 10 desktop, and one pkgrm && pkgadd later, I was running accelerated 3D graphics on the then latest-and-greatest accelerator NVIDIA had for sale.

> By all means continue the holy war. Grab some performance statistics, and hook in the BSDs.

They take our code, and we take theirs; they help us with our bugs, and we help them with theirs. BSD's are actually great. BSD's have smart, capable, and competent engineers who care about engineering correct systems and writing high quality code. We love our BSD brethren.

qwertyuiop924 · on May 12, 2016

...Annnd the BSDs involved. Now all we need is some actual fans, and the Linux hackers should start to retaliate...

But good to know that the Solaris and BSD NVIDIA drivers work. If they work with lx-branding, I might actually consider running the thing.

>Once you're able to afford one, you won't care about the desktop every again, because your desktop will JustWork(SM).

Yeah, no. It seems like OSX is making increasingly radical changes that make it increasingly hard for applications expecting standard POSIX to run. By the time I get the cash, Nothing will work right.

dave2000 · on May 10, 2016

"I'm a bit concerned that it's going to be Android, however, that wins the mobile world since Android is nowhere near the ideal OS from an Open Source and ethical perspective; but, I guess they got the bits right that Larry was suggesting needed to be right."

They've already won; Apple isn't coming back (did they "go thermonuclear" in the end or did that nonsense die with Mr Magical Thinking?).

Don't confuse Android with Google; you can grab the source and do what you want with it, like Cyanogenmod have, or like millions of hobbyists are doing themselves. It's all available with bog standard open source licenses - no need to worry about ethics.

SEJeff · on May 11, 2016

Not really. AOSP died when Jean-Baptiste Queru quit in protest out of AOSP not really being open source. He works at Yahoo now.

http://www.engadget.com/2013/08/07/aosp-maintenance-head-lea...

dave2000 · on May 11, 2016

Was that before or after android overtook iOS though? I'm not that bothered about 1 persons opinion on the licence used.

jordigh · on May 10, 2016

This is a bit funny because Mercurial is partly named after Larry.

https://groups.google.com/d/msg/mercurial_general/c3_SM3p7S1...

luckydude · on May 12, 2016

Just for the record this thread is where I learned that. And yup, it sorta fits. For better or worse.

tikhonj · on May 10, 2016

The point about small communities is great. I've never seen it spelled out like that, but it captures what I was thinking perfectly. People get caught up on popularity as the only meaningful metric for success, but it really isn't.

Annatar · on May 11, 2016

> And you can say to BK and Larry now that it's "too late", just as Larry told Sun in 2005,

In case of software, it is never too late: as you once put it so eloquently, software does not suddenly stop working and does not have an expiration date.

If this software works and works well, then Paul Graham's revolutionary idea of when you choose technology, you have to ignore what other people are doing, and consider only what will work the best applies. (Common sense really, but apparently not to the rest of our industry.)

If this software will work the best, and do exactly what I want and need it to do, I have enough experience to know not to care that everyone else runs something like git just because that is trendy right now. (A lesson appreciated by those who run SmartOS because it is the best available technology for virtualization, cloud, and performance, instead of running Linux and Docker.)

JdeBP · on May 12, 2016

> software [...] does not have an expiration date.

xscreensaver does. (-:

* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=819703

* https://bugs.launchpad.net/ubuntu/+source/xscreensaver/+bug/...

* https://news.ycombinator.com/item?id=11412081

tw04 · on May 10, 2016

Sun is also no more... and it's not at all clear they would've survived had Solaris been open sourced a decade earlier.

Yes, you're absolutely right that there are a ton of startups built on opensolaris (who have proprietary code they haven't and don't intend to ever give back to the community), and there is smartos/omnios/illumos as well. But none of those projects would have in any way contributed to the health of Sun Microsystems, nor provided the funding to get Solaris to where it is today. ZFS may have never seen the light of day if Solaris were open sourced in 1995.

ryao · on May 11, 2016

> ZFS may have never seen the light of day if Solaris were open sourced in 1995.

It depends on how that would have affected Jeff Bonwick. If it kept him from deciding that Sun ought to develop a new filesystem, promising Matthew Ahrens a job writing one out of college and working together with Matt on it, ZFS would never have existed.

luckydude · on May 12, 2016

Some history. I was at Sun and Bob Hagmann was teaching at Stanford and got me to be a TA there. He retired and Stanford asked me if I would teach CS240B so I did. Jeff Bonwick was student and I recognized his ability and recruited him to Sun. He said "I have no experience programming in C" and I said "You are smart. I can teach you C, I can't teach you smart".

I also told him that he'd go way farther at Sun than I did and I was right, I think he made DE, I didn't. He played the game better. Smart guy. Him, Bryan, Bill Moore, those guys were the new Sun in my mind.

ryao · on May 14, 2016

In that case, you are one of the guys on whose shoulders much of what I have done stands. Thank you.

qwertyuiop924 · on May 10, 2016

I actually posted before I saw your comment. I guess I was right. So are you happy to finally have an open source bring-over-modify-merge VCS whose command set makes sense?

ryao · on May 11, 2016

Bit keeper is a great example of what happens when you do not open source your code. I have cited it that way many times.

luckydude · on May 11, 2016

Except that we've been around for 18 years and made payroll without fail that entire time. Supported a team of 10-15 people every year. That's something, many many companies in the valley, including many that open sourced everything, have not done as well.

You may have done more by open sourcing whatever it is that you have done; if so congrats.

ryao · on May 11, 2016

My remark was intended to cite how much farther bit keeper could have gone had it been open source from the start rather than belittle what bit keeper accomplished.

At work, many of my newer colleagues have backgrounds in closed source software development. We are developing software that has no exact analog to existing software, and we hope that it will have a big impact. If it becomes as important as we think it could be, then bit keeper vs git is a fantastic example of why our work should be open source from the start.

debaserab2 · on May 11, 2016

Except you're not Linus and you probably don't have a cult like following for any work you produce.

Linus brought DVCS to the masses, but to pretend there wasn't more at play than simply open sourcing a project and hoping it all works out is complete rubbish. People have families to feed. Closed source is not inherently evil.

It takes a unique situation to produce something like git that's product is beyond the sum of the project itself.

ryao · on May 11, 2016

I wrote enough patches to ZFSOnLinux that I have the distinction of number 2 by commit count. It was a hobby for me at first and quite frankly, I never expected it to make a difference for more than a few hundred people. Now ZoL is on millions of systems through Ubuntu in part because of my work and there are far more places using it than I can count.

Open sourcing those patches rather than keeping them to myself made a difference that was greater than anything I imagined. Similarly, the impact of making ZFS open source far surpassed the expectations of the original team at Sun. I think that making any worthwhile piece of software open source will lead to adoption beyond the scope of what its authors envisioned. All it takes is people looking for something better than what previously existed.

As for closed source being inherently evil (your words, not mine), how do you fix bugs in closed source software that a vendor is not willing to fix? How do you catch things like a hard coded password that gives root privileges? How do you know that the software is really as good as they say? It is far easier with open source software than with closed source software. Closed source software is a bad idea.

luckydude · on May 12, 2016

I still think you need to look at it from the point of view of an employer. Like me. I'm weird, I really care about my people, our company is more like a cooperative than anything else.

I grew this to a place where I could pay salaries. Doing so was super super hard. I had a lot of scary nights where I thought I couldn't make payroll. Just building up to a place where the next payroll was OK was a big deal for us.

So open source it? When you finally got to the point where you can pay people without worrying all night?

I get that you see that open source is the answer, and it is for some stuff. For me, jumping on that years ago was asking too much.

ryao · on May 11, 2016

I have no idea why people are down voting this. In an alternative universe, we would all be using bit keeper. The reason we are not is mainly because Larry McVoy ceded the market to git and mercurial because he was afraid of disrupting his existing business. git and mercurial would never have existed had he practiced at Bit Movers what he preached at Sun.

dsr_ · on May 10, 2016

For people who don't know the history -- McVoy offered free bitkeeper licenses to various open source projects, and the Linux kernel switched to it.

After Andrew Tridgell (SAMBA, among other projects) reverse-engineered the bitkeeper protocol [1] in order to create his own client, the license was rescinded for everyone.

As a result, Linus wrote git.

[1] https://lwn.net/Articles/132938/

jordigh · on May 10, 2016

> As a result, Linus wrote git.

And mpm wrote hg, never forget:

http://lkml.iu.edu/hypermail/linux/kernel/0504.2/0670.html

http://lwn.net/Articles/151624/

drostie · on May 10, 2016

It's astonishing to me that Git has won out given how much easier it's been for me to explain Hg to other people than to explain Git. To this day, in our SVN workflow at my company, nontechnical people who have merely seen a Hg diagram on a whiteboard by my desk immediately grasped the idea and the lingo, and ask me questions like "hey, can you branch the code to commit those changes and push them to the testing server? This thing's really cool and we don't mind playing with the alpha version, but we might scrap it all later."

anarazel · on May 10, 2016

Maybe I'm a too long time user of git, but I really fail to see why git as of the last 5 years is any harder to explain than hg. Personally I think the branching in hg is pretty much broken; alone the fact that it's pretty much impossible to get rid of branches is horrible.

drostie · on May 10, 2016

Because the diagrams for hg are very simple, there is a really simple way to do branching that obviously works and commits a relatively forgivable sin: just `cp -a` the folder.

Now, I know that that's in essence an admission of defeat! I'm not pretending that it's anything less than that. However, this is also the easiest explanation of, and model for, branching that anyone has ever created. The explanation of branching which the nontechnical user immediately understood was, in fact, just having a couple of these repository-folders sitting side by side with different names, `current_version` and `new_feature`. It is a model of branching that is so innocent and pure and unsullied by the world that a nontechnical person got it with only a couple of questions.

Like I said, I'm actually employed at an SVN shop, where branches are other folders in the root folder and the workflow is less "push this to the testing server" and more "commit to the repository, SSH into the testing server, and then update the testing server." But that Hg model resonated with someone who doesn't know computers. To me, that was a moment of amazement.

I'm not even saying which one is better really; I like Git branches too! It's just that I'm astonished that the more confusing DVCS is winning. Most peoples' approach to Git is "I am just going to learn a couple of fundamentals and ask an expert to set up something useful for me." I would guess that most Git users don't branch much; they never learned that aspect to it. I'm really surprised that software developers aren't more the sort to really say "why am I doing this?" and to prefer systems which make it easier to answer those questions with pretty pictures.

ryao · on May 11, 2016

> I'm really surprised that software developers aren't more the sort to really say "why am I doing this?" and to prefer systems which make it easier to answer those questions with pretty pictures.

The network effect should explain it.

That said, using pictures to answer questions is fairly sadistic when those asking them are blind. I know a blind developer and I never use pictures when talking to him in IRC.

mwcampbell · on May 11, 2016

How did Git's network effect get started in the first place?

vacri · on May 12, 2016

It was made specifically for managing the linux kernel, which has huge amounts of contributors all doing their thing in different parts.

JdeBP · on May 12, 2016

They do answer the questions with pictures. But the pictures look like http://xkcd.com/1597/ and the answers are perhaps not what you are expecting. (-:

timv · on May 11, 2016

I think that the hate that gets lumped on mercurial's branches, though understandable, is a bit unfair

Disclaimer, it's been years since I used hg as my primary DVCS. So some of my thoughts here might be out of date, or have a misrecollection.

> branching in hg is pretty much broken*

It really isn't. It's absolutely not suited for the task that many people want to use it for, but it's totally fitting with the intended use case and the "history is immutable" philosophy of mercurial.

Using mercurial branches for anything resembling feature branching is a bad idea. But mercurial branches are perfect for things like ongoing lines of development. So, for a project like PostgreSQL, you'd have a "master" (default) branch for the head of development and then once a release goes into maintenance mode you create a new branch for "postgres-9.4" and any fixes that need to be applied to that release will be made to the maintenance branch.

Following hg's "immutable history" policy the fact that the commit was performed on a maintenance branch is tracked forever. And it should be because the purpose of your source control is to track those kinds of things: "This is the branch we used for maintenance releases of version X.Y.Z, it is now closed since we no longer support that version"

The issues with mercurial's branches are:

- For a long time they were the only concept in hg that had a simple name and looked like "multiple lines of development". Even though hg supported multiple heads and multiple lightweight clones, neither of those had commands or features with a clear and simple name, so they people turned to "branches" expecting them to do what they wanted even when they were a bad fit.

- "branch" is very general name that is often used (quite rightly) to refer to a bunch of slightly different ways of working with multiple concurrent versions. In general use it might refer simply having 2 developers who both produce independent changes from the same parent. Or to intentionally having multiple short lived lines of development based around feature. Or splitting of development right before a release so that the "release branch" is stable. Etc. Yet the feature in hg that is called "branch" is useful for only some of those things. It would have been better to call it a "development line" or something like that.

- It took far too long for hg to get a builtin way to refer to named heads (bookmarks). The model assumed that each repository (clone) would only ever want to have 1 head on each branch (development line) and that producing multiple heads was a problem that ought to be resolved as soon as possible. There's a lot of history behind that approach (almost every CVS and SVN team I ever worked with did that), but DVCS tools made it easier to move away from that, yet official hg support lagged.

So even today, the "branch" concept in hg is only useful for a small number of cases, and the "bookmarks" concept is what most people want, but they're separate things with names that don't align with expectations.

skew4 · on May 11, 2016

This distinction between branches and bookmarks looks like one of the things missing the most from git. Grab a random branch from some place and try to guess whether the committer intends to rewrite it in the future or not: good luck.

For the rest: https://stevebennett.me/2012/02/24/10-things-i-hate-about-gi...

jurip · on May 11, 2016

> So even today, the "branch" concept in hg is only useful for a small number of cases, and the "bookmarks" concept is what most people want, but they're separate things with names that don't align with expectations.

And it doesn't help that the primary hosted repository system is Bitbucket and Bitbucket didn't support pull requests from bookmarks last time I checked.

anarazel · on May 11, 2016

So what about all this contradicts "branching in hg is pretty much broken"?

timv · on May 11, 2016

Your original comment linked broken branches with the fact that they can't be deleted. That's only true if you intended to talk about Mercurial named branches which aren't broken, they just aren't what you want them to be.

Bookmarks (today, and for several years) work just fine. So "branching" in the general sense isn't broken even though the combined feature set is a bit haphazard.

That bitbucket doesn't work well with bookmarks is a sign of how little Atlassian cares about hg, rather than an hg issue.

If you're arguing that the hosting options for hg are limited and fall far below the git options, then I'm not going to disagree.

drostie · on May 11, 2016

In other words, in Git, a branch is just a named pointer to a particular revision. In Hg, these are called 'bookmarks' and they work exactly how you're imagining; and there is an immutable sort of bookmark that is called a 'tag' (bookmarks can be repointed; tags cannot). By creating a new head (i.e. branching) and naming that new head with a bookmark, you do exactly what `git branch` does.

Mercurial also supports an autonaming of revisions which automatically applies to all child revisions, too: these are meant to be independent lines of development with their its own head revision, and are called 'named branches'; that is what `hg branch` does. The problem that you're identifying (and that I agree is counterintuitive!) is that these names become part of the commits themselves and therefore public knowledge. Mercurial warns you when you `hg branch` that this is happening and says "did you want a bookmark?" but does not tell you, e.g., "to undo what you just did, type `hg branch default`."

nickpsecurity · on May 10, 2016

This applied to me as well. I like the metaphor someone wrote that Git is the assembly language of DCVS.

danielbot · on May 10, 2016

Maybe so, but it's missing some important instructions having to do with directories and renames

dsp1234 · on May 10, 2016

"hey, can you branch the code to commit those changes and push them to the testing server? This thing's really cool and we don't mind playing with the alpha version, but we might scrap it all later."

Are they talking about hg or git here. Because that flow in git is:

  git branch
  git checkout
  git commit
  git push

The only thing that git adds to that workflow is that creating a new branch doesn't immediately move you onto it (also that most would use checkout -b to do both). And it's not immediately obvious that a non-technical user would need to know about that in order to get the above point across.

drostie · on May 10, 2016

Ahahahahahaha. You think that works!

No. That fails with the following semi-helpful error message:

    remote: error: refusing to update checked out branch: refs/heads/master
    remote: error: By default, updating the current branch in a non-bare repository
    remote: error: is denied, because it will make the index and work tree inconsistent
    remote: error: with what you pushed, and will require 'git reset --hard' to match
    remote: error: the work tree to HEAD.
    remote: error: 
    remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
    remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
    remote: error: its current branch; however, this is not recommended unless you
    remote: error: arranged to update its work tree to match what you pushed in some
    remote: error: other way.
    remote: error: 
    remote: error: To squelch this message and still keep the default behaviour, set
    remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.

Actually all of these diagrams for Git need to look substantially more complicated because you first off need to introduce repositories which have a cylinder with a cloud over them (the cloud of course is the staging area) with a sort of recycle-reduce-reuse pattern of arrows `add`, `commit`, `checkout` between these three entities, with the caveat that `checkout` is only kinda-sorta what you're looking for with this. In fact there is a cylinder-to-cylinder `pull`-type operation called `fetch`, but `git fetch; git checkout` will not actually update any files, revealing the gaping hole in this simple picture, and you'll have to type `git status` to find out that you're directed to do a `git pull` anyway, which has to be diagrammed as an arrow pointing from the remote cylinder, bouncing off the local cylinder, and then pointing at the local folder.

To get to talk about `push` you then need to introduce the SVN-style "bare repository" in the diagram, a folder-box with the cylinder now drawn large inside it, and explain that this folder exists only to contain the .git subfolder and act as an SVN-style repository. You can then draw `pull` arrows down from it and `push` arrows up to it.

Then the workflow is more SVN-style:

    git branch
    git checkout
    git commit
    git push
    ssh testing-server
    cd git-repository
    git pull

Now that almost works, except the `git branch; git checkout` flow is not the proper way to push changes in the working directory to the new branch. (The context of the conversation was stuff that was already being developed, presumably on the master branch.) That fails on the checkout with an error message like:

    error: Your local changes to the following files would be overwritten by checkout:
            foo
            bar
    Please, commit your changes or stash them before you can switch branches.
    Aborting

But, I mean, close enough. It's `git stash branch <newbranch>` and it generates an ugly error message but it does exactly what you want it to do, so you can ignore that error message and hack away.

Now, you're missing the point if you think "God, drostie is really pedantically getting on my case for missing the remote-repository-update and the git stash here! Anyone will learn that workflow eventually!"

The point was not any such thing, the point was clean diagrams when explaining the idea to a fellow developer -- in fact a diagram so clean that a nontechnical user asked about it and accidentally learned enough to get some new vocabulary about how a developer's life works, so that they could more effectively communicate what they want to the developer.

It is my contention that the git diagram, as opposed to the git workflow, is sufficiently messy that a nontechnical eye will lose curiosity and most certainly will not get the idea of "make a branch, push the branch to the shared repository, then update the testing repository, then switch to that branch, then discard that branch if things don't work out." That strikes me as too in-depth for nontechnical casual users to express.

emn13 · on May 10, 2016

I think people have a way too high tolerance for this kind of crap. We're also kidding ourselves if we think we're smart enough to work with this kind of complexity at no cost.

Our job is often to think up new things. It's really hard to come up with new abstractions when your thinking is muddled by all kinds of incidental complexity.

luckydude · on May 11, 2016

This is buried but in case anyone reads it, the real reason to open source BK is to show the world that SCM doesn't have to be as error prone or as complicated as Git. You need to understand how Git works to use it properly; BK is more like a car, you just get in and drive.

nullnix · on May 11, 2016

That metaphor... needs work. Cars need a considerable amount of training to learn to use safely, let alone correctly. I can hack C enough that I dream in it routinely, and due to the resulting brain damage found git intuitive from the start, but there is no way I'll ever learn to drive: it's just too hard.

stephenr · on May 12, 2016

> Cars need a considerable amount of training to learn to use safely, let alone correctly.

Significantly less training than is required to know the internals of how it operates though.

wyoung2 · on May 11, 2016

> I think people have a way too high tolerance for this kind of crap.

I agree.

The core problem with Git is that it was designed to serve the needs of the Linux kernel developers. Very, very, very few projects have SCM problems of similar complexity, so why do so many people try to use a tool that solves problems they don't have? Much of that internal complexity extends up into the Git interface, so you're paying for complexity you don't need.

Others in this thread have praised hg and bzr for their relative simplicity for a DVCS. I'd also like to point out Fossil.

In the normal course of daily use, Fossil as simple to use as svn.

About the only time where Fossil is more complex is the clone step before checking out a version from a remote repository.

Other than that, the daily use of Fossil is very nearly command-for-command the same as with svn. Sometimes the subcommands are different (e.g. fossil finfo instead of svn status for per-file info in the current checkout) but muscle memory works that out fairly quickly.

Most of that simplicity comes down to Fossil's autosync feature, which means that local checkout changes are automatically propagated back to the server you cloned from, so Fossil doesn't normally have separate checkin and push steps, as with Git. But if you want a 2-step commit, Fossil will let you turn off autosync.

(But you shouldn't. Local-only checkins with rare pushes is a manifestation of "the guy in the room" problem which we were warned against back in 1971 by Gerald Weinberg. Thus, Fossil fosters teamwork with better defaults than Git.)

Branching is a lot saner in Fossil than svn:

1. Fossil branches automatically include all files in a particular revision, whereas svn's branches are built on top of the per-file copy operation, so you could have a branch containing only one file. This is one of those kinds of flexibility that ends up causing problems, because you can end up with branches that don't parallel one another, making patches and merges difficult. Fossil strongly encourages you to keep related branches parallel. Automatic merges tend to succeed more often in Fossil than svn as a result.

2. Fossil has a built-in web UI with a graphical timeline, so you can see the structure of your branches. You have to install a separate GUI tool to get that with most other VCSes. The fact that you can always get a graphical view of things means that if you ever get confused about the state of a Fossil checkout tree, you'll likely spend less time confused, because you're likely also using its fully-integrated web UI.

3. Whereas svn makes you branch before you start work on a change, Fossil lets you put that off until you're ready to commit. It's at that point that you're ready to decide, "Does this change make sense on the current branch, or do I need a new one?"

Fossil's handling of branches is also a lot simpler than Git's, primarily because the local Fossil repository clone is separate from the checkout tree. Thus, it is easy to have multiple Fossil checkouts from a given local repo clone, whereas the standard Git workflow is to switch among branches in a single tree, making branch switches inexpensive.

(And yes, I'm aware that there is a way to have one local Git checkout refer to another so you can have multiple branches checked out locally without two complete repo clones. The point is that Git has yet again added unnecessary complexity to something that should be simple.)

alexeiz · on May 11, 2016

Why is it that some people get Git naturally and some experience a world of frustration trying to use it? I think the kind of problems you describe usually come up if you approach Git with a mindset formed by another SCM. They are typical for people who are proficient with, say, SVN and who try to use Git thinking that Git must work something like SVN. (I'm using SVN as just an example here; it could be any other SCM, but I most frequently see people coming from SVN to really struggle with Git.) Well, Git is nothing like SVN and you'll always be missing something if you try to understand Git through SVN concepts. It's best to forget what you used before and learn Git from a clean slate. Maybe I was just really lucky to never having to learn SVN (or CSV or ClearCase), so Git concepts and workflows were clear and almost effortless to understand and use. Or maybe it's like the concept of pointers: some people get it right away and others never get it.

masklinn · on May 11, 2016

> Why is it that some people get Git naturally

The only people I've ever seen "get Git naturally" were developers starting from the implementation details and working their way up (#0)[0].

Everybody else either worked very hard at it(#1)[1] or just rote-learned a list of commands(#2) that pretty much do what they want from which they don't deviate lest the wrath of the Git Gods fall upon them and they have to call upon the resident (#1) or heavens forbid the resident (#0) who'll usually start by berating them for failing to understand the git storage model.

> Well, Git is nothing like SVN and you'll always be missing something if you try to understand Git through SVN concepts.

Mercurial is also nothing like SVN, the problem is not the underlying concepts and storage model, it's that Git's "high-level UI" is a giant abstraction leak so you can't make sense of Git without understanding the underlying concepts and storage model, while you can easily do so for SVN or Mercurial.

[0] because the porcelain sort of makes sense in the context of the plumbing aka the storage model and implementation details

[1] because the porcelain in isolation is an incoherent mess with garbage man pages

qb45 · on May 11, 2016

> Now that almost works, except the `git branch; git checkout` flow is not the proper way to push changes in the working directory to the new branch. (The context of the conversation was stuff that was already being developed, presumably on the master branch.)

> But, I mean, close enough. It's `git stash branch <newbranch>` and it generates an ugly error message but it does exactly what you want it to do, so you can ignore that error message and hack away.

Isn't

  git checkout -b new_branch
  git commit -a
  git push

what you are looking for?

And as for the push problem:

1. You aren't going to encounter it in git when pushing a newly created branch, but yes, you then have to ssh in and check it out.

2. I wonder how Mercurial handles pushing to repository with uncommitted changes, does it just nuke them?

namdnay · on May 11, 2016

Regarding 2 - Do you mean the destination repo has uncommitted changes? There is no need to nuke them, as pushing to this repo will have no influence on the working set: it will just add new changesets in the history!

qb45 · on May 11, 2016

OK, that makes sense. I imagined that hg implicitly updates remote working set and that parent complained about git's behavior because

  hg push ssh://testing-server/

worked as lazy man's single command deployment for him :)

drostie · on May 11, 2016

Regarding your first question: Yes, you can also `git checkout -b newbranch`, rather than stashing the changes and then stashing them into a new branch; I just tend to stash my changes whenever I see that there's updates on the parent repository. Call it a reflex.

Of course, you can also commit your changes and then `git branch`, which sounds insane (that commit is also now on the master branch!) until you remember that branches in git are just Mercurial's pointers-to-heads. This means that you can, on the master branch, just `git reset --hard HEAD~4` or so (if you want the last 4 local commits to be on the new branch and you haven't pushed any of them to the central repo), and your repository is in the state you want it in, as well. (And you'll need that last step even if you `git checkout -b`, I think.)

Regarding your second one, Mercurial's simplified model is actually really smart. You have to understand that Git complects two different things into `pull`: updating the repository in .git/ and updating the working copy from the repository. In Mercurial these are two separate operations: you update/commit between the working copy and the repository; you push/pull between two repositories. The working copies are not part of a push/pull at all. So if you push to a repository with uncommitted changes in its working copy, that's fine. The working copy isn't affected by a push/pull no matter what.

With that said, if that foreign repository has committed those changes, Hg will object to your push on the grounds that it 'creates a new head', and it will ask you to pull those commits into your copy and merge before you can push to the foreign repository. (The manpages also say that you can -f force it, but warn you that this creates Confusion in a team environment. Just to clarify: a 'head' is any revision that has no child revisions. In the directed acyclic graph that is the repository history, heads are any of the pokey bits at the end. You can always ask for a list of these with `hg heads`.)

"OK," you say, "but let's throw some updates into the mix, what happens? Does it nuke my changes?" And the answer is "no, but notice who has the agency now." Let's call our repositories' owners Alice and Bob. Alice pushes some change to Bob's repository. Nothing has changed in Bob's working folder.

Now if Alice tells Bob about the new revision, Bob can run an update, if he wants. Bob has the agency here. So when the update says, "hey, those updates conflict, I'm triggering merge resolution" (if they do indeed conflict), he's present to deal with the crisis. Git's problem was precisely "oh, we can't push to that repository because we might have to mess with the working copy without Bob's knowledge," and it's a totally unnecessary problem.

Bob can also keep committing, blithely unaware of Alice's branch, if Alice doesn't tell him about it. The repository will tell him that there are 'multiple heads' when he creates a new one by committing, so in theory he'll find out about her commits -- though if you're in a rush of course you might not notice.

Bob can keep working on his head with no problem, but can no longer push to Alice (if he was ever allowed to in the first place), because his pushes are not allowed to create new heads either. In fact he'll get a warning if he tries to push anywhere with multiple heads, because by default it will try to push all of the heads. However he can certainly push his active head to anyone who has not received Alice's branch, just by asking Hg to only push the latest commit via `hg push -r tip` -- this only sends the commits needed to understand the last commit, and as long as that doesn't create new heads Bob is good to push.

qb45 · on May 12, 2016

> Call it a reflex.

PTSD? :) Use local topic branches for everything to avoid unpleasant surprise merges. Once you are ready to merge, pull the shared branch, merge/rebase onto that and push/submit/whatever.

I sometimes keep separate branch for each thing that I intend to become a master commit. This way I can use as many small and ugly commits and swearwords as I please and later squash them for publication after all bugs are ironed out.

This helps with remembering why particular commits look the way they do, especially in high latency code review environments where it can take days or weeks and several revisions to get something accepted.

> Git's problem was precisely "oh, we can't push to that repository because we might have to mess with the working copy without Bob's knowledge," and it's a totally unnecessary problem.

Actually Bob's working copy isn't modified, it's just that if his branch was allowed to suddenly stop matching his working copy, he would probably have some fun committing (not sure what exactly, never tried).

SEJeff · on May 11, 2016

No, the git equiv of svn up is not git pull, but git pull --rebase.

TickleSteve · on May 10, 2016

git is the primary example of how bad Linus Torvalds is at writing UI code.... ;o)

ryao · on May 11, 2016

From what I read, he connected to a bit keeper repository via telnet on port 5000, executed the help command and then used that information to write an incomplete client. That does not sound like reverse engineering to me.

rkangel · on May 11, 2016

It is reverse engineering. It's just easy reverse engineering.

gadders · on May 10, 2016

As I remember it, it was a bit of a douche move by Tridgell, driven by a Stallman-like free software ideology.

khrm · on May 10, 2016

It wasn't. He gave conclusive reply which established that it's ethical (just telneting and help). Unless you believe samba and everything else is unethical and you club every reverse engineering under one umbrella, your comment is wrong. http://www.theregister.co.uk/2005/04/14/torvalds_attacks_tri...

Mizza · on May 10, 2016

I don't think it's fair to call people douches because they are committed to their moral principals. Especially so here, where the benefit to humanity over the alternative is so clearly obvious.

serge2k · on May 10, 2016

It is when they attempt to force their moral code on others.

Is the benefit clearly obvious? If you actually adhere 100% to Stallman's code I'm not so sure.

timv · on May 11, 2016

Tridge made no attempt to force his code on others.

In fact, it was the reverse - he felt like he was being locked out of kernel development because he didn't want to align his moral code with those who used BK.

So, he tried to find a way to hold true to his code without forcing the rest of the kernel team to give up BK.

sspiff · on May 10, 2016

> a Stallman-like free software ideology

You say that like it's a bad thing.

tubelite · on May 10, 2016

As I remember it, he did

telnet bk-server 5000

and typed "help".

https://lwn.net/Articles/132938/

gadders · on May 10, 2016

That's the "how" not the "why".

devnonymous · on May 10, 2016

So having a genuine need to be able to actually use tools that you wrote rather than something a company 'licenses' to you so that can modify, and share these tools is being a douche? Odd that you would think that companies that treat their users like untrustworthy hackers are not douches but those users are!

striking · on May 10, 2016

Here's an article about that: http://www.theregister.co.uk/2005/04/14/torvalds_attacks_tri...

luckydude · on May 10, 2016

Lots of cross platform goodies in there as well as some interesting data structures. For example, our list data structure is in lines.c, it's extremely small for a small list and scales nicely to 50K items:

http://bkbits.net/u/bk/bugfix/src/libc/utils/lines.c?PAGE=an...

to3m · on May 10, 2016

1 year ago: https://news.ycombinator.com/item?id=9330482

What changed? Is BitKeeper still an ongoing business with some other model, or is that, as they say... it? I hope not.

luckydude · on May 10, 2016

This is to answer this question and all the "too late" comments.

Too late? Maybe. But we had a viable business that was pulling in millions/year. The path to giving away our stuff seemed like:

     step 1: give it away
     step 2: ???
     step 3: profit!

And still does. So what changed? Git/Github has all the market share. Trying to compete with that just proved to be too hard. So rather than wait until we were about to turn out the lights, we decided to open source it while we still had money in the bank and see what happens. We've got about 2 years of money and we're trying to build up some additional stuff that we can charge for. We're also open to being doing work for pay to add whatever it is that some company wants to BK, that's more or less what we've been doing for the last 18 years.

Will it work? No idea. We have a couple of years to find out. If nothing pans out, open sourcing it seemed like a better answer than selling it off.

canadiangeek2 · on May 11, 2016

My $0.02 canadian; Build something that kicks Gitlab and Github's ass. What an opportunity. Support both BK and GIT repos. Provide a distributed workflow that enterprises will love. Enterprises are obviously where the remaining dollars are. There are billions of dollars of inefficiencies in that sector. Many of these enterprises do NOT want to host their code on Github and are buying Gitlab. Be better than Gitlab.

nikatwork · on May 11, 2016

Github sucks for corporate version control, it's just not designed for the kind of strict role-based change control that enterprise needs. Great support though.

I haven't checked out Bitbucket because last time I evaluated (2+ years ago) they didn't have good on-prem options.

jason_s · on May 11, 2016

^^ this. PLEASE. :-(

Bitbucket has been rotting since Atlassian bought them, and now there's really no "killer app" for Mercurial hosting. There are Mercurial hosting services out there, but nothing anywhere close to Github/Gitlab.

marcinkuzminski · on May 11, 2016

You should check out RhodeCode. It's no hosting platform, but hosting it yourself is much better. It support Mercurial, and all latest things that comes with it like phases, largefiles etc.

Actually since BK is now opensource we might think of adding a BK backend to RhodeCode and our VCS abstraction layer that already supports GIT, Mercurial and Subversion

luckydude · on May 11, 2016

We would love that. Contact dev@bitkeeper.org if you have any questions/issues.

riffraff · on May 11, 2016

doesn't github support mercurial?

cauterized · on May 11, 2016

Nope, but Bitbucket does support git.

sytse · on May 11, 2016

At GitLab we're unlikely to support BK due to lack of demand. What do you mean with a distributed workflow, something like https://gitlab.com/gitlab-org/gitlab-ce/issues/4084 ?

augustz · on May 11, 2016

Great comment. Good points. Also - for enterprise, it's OK if the model ends up being a bit simpler than git - may actually be a positive. Give up some things, but get simplicity that scales to a 1,000 folks using some old VCS.

Looking forward to some hopefully differentiated features.

aerique · on May 11, 2016

Just supporting big binary files in a hassle-free way would go a huge way to being better than Git and Gitlab.

ryao · on May 11, 2016

What would you consider to be hassle-free?

isxek · on May 10, 2016

As someone who's also read about how Git and Mercurial started (and how Bitkeeper is involved in it), I'm interested in seeing how it will play out. I hope it does work out for you and your team. Thanks for getting it out there.

I'm also interested how open-sourcing BK will improve the other systems, too.

jordigh · on May 10, 2016

> I'm also interested how open-sourcing BK will improve the other systems, too.

#mercurial in Freenode right now is monitoring this thread, very relevant to our interests.

Someone at Facebook in #mercurial right now is trying it on some Facebook repo, to compare performance.

wscott · on May 10, 2016

If they are interested we would be happy to help them tune performance.

isxek · on May 10, 2016

Ha! I'll be checking that out in Freenode now. (Wonder what mpm would say after all this time...)

_bz2r · on May 10, 2016

Thanks for providing this level of detail; it's interesting to see the considerations that went into your decision.

How / why did you decide to use the Apache license rather than the GPL?

(It seems like a viral license might protect you a little bit, if you want to prevent your competitors from forking and improving your code base and then using it to compete against you.)

luckydude · on May 10, 2016

We decided to go all in on open source. Given our history, anything but a "here ya go" license wasn't going to go over well. We're aware that someone could fork it and compete against us, good on them if they can. Making money in this space isn't easy and if they can do better than us we'll ask 'em for a job. We know the source base :)

As to why that license, I think it was because LLVM or clang or both had recently picked that and all the lawyers at all the big companies liked that one. We don't particularly care, if everyone yells that it should have been GPL we'll fork it and relicense it under the GPL. Our thought was that Apache is well respect and even more liberal than the GPL but we can be convinced otherwise.

nostrademons · on May 10, 2016

(Apache2 has a number of explicit clauses that make it preferable for open-sourcing commercial software. For example, it automatically grants a patent license for any patents used by the software, but then terminates that license if a licensee sues over that software only [as opposed to React's original patent clause, which could be construed as terminating the license if you sue Facebook at all and got them into a lot of trouble], so that contributors can include patches covered under their patents without poisoning it for everyone. It also defines that all contributions are licensed under Apache2 as well, so that if you take patches and then incorporate them into your commercial software, the contributor can't turn around and sue you for them. And it's GPL3-compatible, which many other permissive licenses aren't.)

nickpsecurity · on May 10, 2016

Great explanation. Allowing use of, but limiting effect on, patents is critical to get more OSS out of big, patent-loving companies. Let's them know they still have their power and profit while doing something altruistic. Or that helps them in the long-run (free labor) while accidentally being altruistic. Works for me either way. ;)

clacke2 · on May 10, 2016

GPLv3 is Apache 2 compatible, which GPLv2 isn't. Most other permissive licenses are compatible with any GPL.

gsylvie · on May 11, 2016

Most GPLv2 licensed software includes the clause "either version 2 of the License, or (at your option) any later version." Through this mechanism a lot of GPL licensed software becomes compatible with Apache 2.

Also, replying to something a little higher in this thread, I wouldn't say that Apache 2 defines all contributions as Apache 2. That section of the license starts with the words "5. Submission of Contributions. Unless You explicitly state otherwise, ..."

And so Apache 2 just becomes the assumed default license on contributions, but it's not at all forced or required that contributions come in under Apache 2.

mankash666 · on May 11, 2016

Please do not GPL this! You've made the sensible choice. Your analysis on making money is spot on.

ashitlerferad · on May 11, 2016

GPL please!