Hacker News new | past | comments | ask | show | jobs | submit login
APFS – A Backup Software Developer’s Perspective (macdaddy.io)
168 points by 01001 on Sept 23, 2017 | hide | past | favorite | 53 comments



If your backup tool uses rsync under the hood, which is just a normal user-space process that uses the standard filesystem APIs, why does it matter what the underlying storage is? Obviously filesystem bugs can cause issues with this assumption, but filesystem bugs can break a whole host of other things in your backup tool.


If I understood the blog post correctly, the backup software turned out to basically work just fine on APFS. It’s the data recovery software that will take some time to port over.


This was what I was left wondering as well. rsync for sure does not have any FS-specific code that I can find. If it uses the FS APIs provided by the OS, just like every other program that touches files, why would we expect anything other than boring old rsync doing it's thing?


rsync has a ton of HFS+ specific code. Making a faithful duplicate of a file (specifically its metadata) is a feat on its own: http://blog.plasticsfuture.org/2006/03/05/the-state-of-backu...


Upstream rsync does not seem to have any special-casing for mac/hfs. I'd wager Apple's extended-attribute handling is added in this patch[0] for the version distributed with macOS.

Even here, there isn't special casing for HFS. Instead, a special library function, copyfile()[1], is used to handle copying files and their associated metadata.

It seems this function was introduced in Mac OS X 10.5, which was after the article you linked. I'd wager copyfile() was introduced in response to the unwieldy file copy mechanics.

After discovering how copyfile() is used in rsync, I am fairly confident that rsync works so well on the new FS as a result of Apple implementing a fairly solid copyfile() for APFS.

[0] https://opensource.apple.com/source/rsync/rsync-20/patches/E... [1] http://www.manpagez.com/man/3/copyfile/


You're looking in the wrong place. Install the homebrew version, it has 3 patches for macOS that aren't in the upstream.

https://github.com/Homebrew/homebrew-core/blob/master/Formul...


The Mac version has some file system specific code https://developer.apple.com/legacy/library/documentation/Dar...:

”E, --extended-attributes copy extended attributes, resource forks”


Filenames are treated a bit differently on APFS. Not sure what else, but it seems there's a lot of other bizarre file metadata on OS X that you might want to sync.

edit: not really sure though, because it doesn't seem like this program cares about any of those metadata.


It cares about those metadata a lot. Backups wont be bootable without it. And there is also disastrous data loss without it, a lot of files have their actual file data embedded in the metadata (resource forks et al). It's nuts because it's based on a 30 year old filesystem. That's one of the reasons why the smooth transition is impressive.


And if you're just using rsync under the hood, you're making money off a fancy GUI with pretty minimal actual effort as well.


That's the way it looks until you actually try to make a commercial piece of software by doing that.

As it mentions in the blog post I come from a background of low level system programming (data recovery, even some kernel extensions, etc). It would have not added significant time to have rolled my own file copy tool. Irrespectively, it took about 1000 hours of work to make that app, as it says in the blog post. Things are nowhere as easy as they look. Doing the actual file copy is the easy part.


Let's suppose you had a fancy GUI wrapper around rsync which covered all the command line options, and a tiny output parser to display progress information. What more effort is needed to get a commercial, market-ready piece of software?


I just answered this in another response (and in addition the answer, you could answer a lot of the questions yourself by actually trying the app. It has animations built into it for example, it's clear at a glance that it's far beyond what you described):

It's also not just a GUI. There is loads of code in there that does stuff. For example:

• Scheduling for scheduled backups

• A background daemon that looks for events to launch the UI for scheduled backups

• Manipulation of disks ownership stuff in order to allow backups with permissions to happen correctly

• Volume blessing, etc, in order to make bootable volumes

• Trash handling & maintenance for deleted files

• Deleting files (for removing old backups) And that was just off the top of my head. The app is many thousands of lines of code and it performs literally hundreds of functions. They're nearly all transparent to the user, and that is the idea.


I used to think like that, turns out things get out of hand quickly once it has to be used by non technical users. I also build a backup app, but in my case for virtual machines [0]

As feelix says it is easy to burn 1000 hours on it. In addition to what feelix says you need to write help, build a site, do a lot of testing, write an installer etc. It does not compare at all to writing a script for technical users.

[0] http://www.vimalin.com


I highly recommend the essay "The Programming Systems Product" from the collection "The Mythical Man-Month" which provides an answer to this exact question.


Come on, be fair, half the web properties are a GUI to something you could have for free on the CLI, making a usable GUI is not a trivial task.


It's also not just a GUI. There is loads of code in there that does stuff. For example:

• Scheduling for scheduled backups

• A background daemon that looks for events to launch the UI for scheduled backups

• Manipulation of disks ownership stuff in order to allow backups with permissions to happen correctly

• Volume blessing, etc, in order to make bootable volumes

• Trash handling & maintenance for deleted files

• Deleting files (for removing old backups)

And that was just off the top of my head. The app is many thousands of lines of code and it performs literally hundreds of functions. They're nearly all transparent to the user, and that is the idea.


Presumably his customers feel it is worth the money, they wouldn't buy it otherwise.


And this is how you do SEO-oriented content marketing, kids.

Rather light on the substance, few select links (with at least one pointing out) and lots and lots of precisous keywords.

--

Besides, the fact that a _file_ backup tool might need a near complete rewrite with a change of a file system should raise some eyebrows. It might be understandable for an imaging backup or if it was working directly with raw file system, but this one doesn't.


I finished up the APFS compatible version of the software way quicker than I thought I would, and was so happy and energized that as part of my release I whipped up a quick blog post about the experience.

Indeed it's pretty light on content, it's more just a nod to Apple for making this a painless process rather than an excruciating one, as well as an early relay of the experience to others who are going to upgrade who may be harboring some concerns like I was.

The marketing for this site sucks, the software has been around since 2011 and it has nowhere near the marketshare of its competitors because it sucks. If I were really some SEO genius (and you'll see the google rankings are terrible), then this would not be the case. In short, I wish I was the evil genius that you claim, but I'm not. I'm just a dev sharing his experience with the filesystem.

In regards to it not implementing the file copying tool directly - that is covered in the article. The fact that the file copying tool was compiled in 2015 and still works flawlessly is the whole premise of the article stating that it's impressive. And it being a _file_ copying tool is the part that _makes_ it impressive. You seem to have no idea how complex the metadata, built up over 30 years without a rewrite, on HFS+ is. Take a look at BackupBouncer and the associated blog posts to get an idea of the situation. It's pretty crazy.


> Besides, the fact that a _file_ backup tool might need a near complete rewrite with a change of a file system should raise some eyebrows.

It appears you've never written file backup software for MacOS. I have, and it's anything but trivial to get all the edge cases right. There are half a dozen different schemes for metadata and you have to handle all of them. There are packages which are really directories. ACLs. Two different kinds of softlinks, plus hardlinks. Finder attributes. Hidden files and caches that usually don't matter, but sometimes do. Creation dates, which don't exist on [older versions of] linux. And legacy cruft like resource forks and file types, which some users still depend on. Spotlight comments (nee Finder comments). Tags. Special locks added by Time Machine. Weird new locks that even root can't penetrate. (You have to fail gracefully on those, as the author points out.)

It's a nightmare. Go write a complete file-level backup system or a transparent VM file-mirroring solution for HFS+ and get back to me. In the meantime, you'll just have to trust the original author when he says that Apple's seamless transition to APFS is astonishing.


Incidentally, I did work on a backup program for Windows and Unix-ish OSes some years ago. There are indeed locked files, there are hardlinks, symlinks, junctions and other types of reparse points that will give HFS+ a run for its money, there are alternate streams, DACLs, SACLs, there is a monstrosity called volume shadow copying, there is a lack of support for created times on some file systems, times that are supported are truncated differently, some file names are not supported here, but supported there, etc. Lots and lots of cruft accumulated over past decades.

But it's not a nightmare. It's just a very large pile of mostly trivial stuff.

Also none of this should require a "complete rewrite" if you are to add support for yet another file system. Saying something like this means that either existing program is a bowl of spaghetti code _or_ that someone is being coy and exaggerates implied difficulties. Are you going to argue with that? Because that was the point you were commenting on.


I would have expected two things with the transition to APFS:

--Certain edge cases of HFS+ were no longer supported or would now be supported differently.

--The various special-case file-manipulation APIs in MacOS would have changed.

Both the above tend to happen to some extent with every major rev of MacOS and even some minor revs, and they necessitate substantial code changes. The fact that the old code still "just works" with the APFS change I find surprising.


I was initially planning on a complete rewrite in order to start supporting snapshots in a different way (by using the ones saved by APFS). Going into that was outside the scope of the article.


> And this is how you do SEO-oriented content marketing, kids.

And this in turn, kids, is how you assume bad faith without any reservation or doubt.


Having watched the news for a while, primarily on HN and Slashdot, I've seen enough tracking, leaks, sell-outs, adware incidents, dirty marketing, astroturf campaigns, compromised software, etc. to not expect some degree of bad faith. Perhaps not this time, but it is totally reasonable to question everything.


Who said anything about this being written in bad faith?

It's a really well executed example of content marketing. Whether it was intentional is a separate question, but the post is really quite shallow in technical terms and it exagerrates things here and there, so - yeah, it does look like something written primarily for promo reasons.


Sure. There's nothing wrong with promotional material, nor is there anything wrong with calling a spade a spade.

To be fair, though, your original comment did come across as negative — was it supposed to?


>Rather light on the substance, few select links (with at least one pointing out) and lots and lots of precisous keywords.

And yet I've learned more from the post than from this comment.

They have like 6 posts on their blog over several years. Hardly the SEO spinsters one would imagine.


"So, before installing 10.13, I started off by googling for any problems experienced by people installing and upgrading filesystems during the process using the public betas. To my surprise there were none."

Well, my quick visit to developer discussions forum at Apple website related to macOs Beta quickly revealed a lot of people with the problems. It seems like the author didn't look too far:

https://forums.developer.apple.com/community/beta/macos-1013...


That link shows discussions related to problems with the OS but it doesn't list any problems with the filesystem upgrade. If it does, they might have been bumped down by more recent posts but the only major issue (and the person posting it admits that they changed some permissions) is a read-only drive. That seems like pretty good odds.


If you go beyond the first page you will see more, including data loss after a few days of usage and during setup.

I'm not trying to be negative, I was just pointing out the results of my own research into this question when I was considering to try High Sierra beta. It's not "zero issues" as the original author wrote.


Mind linking to one? I can't find anything that has data loss in the subject and most of the other threads are not about APFS. The only one I can still see is regarding a kernel panic but it doesn't give any details, just an assumption that the issue is with APFS.


Sure.

"I converted my Macintosh HD to the APFS and it will not let me boot it up anymore." https://forums.developer.apple.com/thread/80784?q=apfs

"1TB APFS Container Corrupt, Then Reinit to Single Volume" https://forums.developer.apple.com/message/247781#247781

"APFS Brand new filesystem corruption" https://forums.developer.apple.com/message/145039#145039

"Re: Corrupted APFS disk map error" https://forums.developer.apple.com/message/254468#254468

"Device does not contain a valid APFS container" https://forums.developer.apple.com/message/235888#235888

"Re: Files & file changes disappear on APFS volume" https://forums.developer.apple.com/message/241082#241082

"Re: MacOs High Sierra Beta corrupted my disk" https://forums.developer.apple.com/message/244120#244120

I'm not really interested in debating this further, so if you are not satisfied by these references let's agree to disagree :)


All of those, bar one, are from previous versions (not the latest GM) of High Sierra, and then it's not certain that that was what they were using. So granted, you have managed to find up to one case of what you were saying.


When I first tried installing the second developer (first public) beta, the install failed during the FS conversion, my computer was left in an unbootable state and I had to wipe my drive, reinstall the os (to be clear, the beta and APFS again) and restore from backups. No significant problems after that, but that's sort of a how-did-Mrs-Lincoln-like-the-play type thing.


Yep, there are a ton of bugs in High Sierra. I think this is going to be a risky upgrade for a lot of Macs. From video drivers not being ready to ship to weird issues with permissions and APFS, there is going to be a long tail of misery and issues on the 25th.


>Although too early to declare victory just yet, my anecdotal experiences combined with my research on the web does lead me to think that the mass transition from HFS+ to APFS is going to be another victory for Apple. This opinion differs from all others I have read from well informed people, many of who are predicting either catastrophe or at the very least some major stumbling blocks. In this case, time will tell, but I am bullish where most others are bearish. The only thing that bothers me is that I can’t understand how they managed to pull it off.

Kudos to Apple!


I was hoping to find atleast one technical detail of why this is an engineering challenge. But the author simply repeats the same thing over and again about how it's hard, how competitor says it's hard, how others think it's impossible. In the end it was flawless


> uses Rsync under the hood

In that case, what difference does the underlying block storage format really matter? If rsync didn’t work, general applications could easily be expected to have issues too.


It's the metadata that's really important. Just copying a file, while faithfully preserving metadata, is actually a feat on HFS+. That is why popular tools such as BackupBouncer (to test the integrity of backups) came to be.


I think you missed my point, everything rsync does is via the very same system calls any other app could use to originally get or set that metadata. As such, the kernel file system layer abstracts the block storage away from you. This is very different than say Shirt Pocket’s approach (linked to in the article) where they are directly accessing/modifying/creating low level block devices. I’d expect what they are doing to be SIGNIFICANTLY harder. The reason why OP is generally positive is because essentially the only thing they had to do to support APFS is a bunch of testing, unless there’s a part to this story they didn’t tell.


I honestly have no idea what you are talking about. Shirt Pocket would not "access/modify/create low level block devices". They would also be working at the file layer. If you mean that you think that they are implementing their own way of reading the filesystem through the POSIX node, I really dont think that they are, and also it would be crazy to try to do that.


I'm really curious what is going wrong with Bear.app on APFS.

The theming engine breaks on APFS (all the themes look like one particular theme) - but it works fine if you copy the app to a disk image formatted with any other filesystem, including Case-sensitive APFS (so it only breaks on case-insensitive APFS).

Nothing stood out to me in the dtruss output.


> The theming engine breaks on APFS (all the themes look like one particular theme)

I'm not sure if I'm running into the same problem as you, but I just opened up Bear on my Mac to check and applying the themes themselves works for me, however the preview of each theme is the same.

This is on macOS 10.13 Beta (17A306f) and Bear 1.0.4 (3733).

Does each theme look identical when applied for you?

[0]: https://imgur.com/dBswvky [1]: https://imgur.com/jpHbS4i [2]: https://imgur.com/LxUWWS3


Yeah, same issue. Pretty jarring for me, because I like the white theme.

Make an HFS dmg in disk utility and move the app over and it will work fine.

Surprisingly, it works fine on "APFS (Case Sensitive)" too (I had thought maybe it was a case-folding issue).

Edit: Oh, for me the theme works for the list of documents, just not the "edit" panel. The two halves are stored in separate files on disk for some reason. (https://i.imgur.com/ZGS3Fp9.png)


For me, the themes look the same in the preview pane. Interestingly enough, changing the theme changes every pane except for the editor pane (which is stuck with one theme). Restarting does not help.

Bear 1.2.5 (4944) and macOS 10.13 (17A362a).


A few people here are disappointed that this post is about a tool that uses rsync. But the "competitor" link is more interesting - it points to the makers of SuperDuper!, a tool that makes bootable external backups. A couple of highlights:

"...it's important to note that Apple still hasn't released any documentation on the "proper" way to create a bootable APFS volume. An example of what they have in mind was released for the very first time when the High Sierra developer release came out a few months ago, but that's it. We basically have to make an educated guess about what they want."

and

"In particular, Apple has further tightened its System Integrity Protection process, and is completely denying access to some files on the startup volume, even when copying to a non-startup volume.... But since APFS is, again, basically undocumented... what could that backup be missing?"

http://www.shirt-pocket.com/blog/index.php/comments/news_on_...


I haven’t tried the GM seed yet, but I’ve reported a pretty annoying bug with inconsistent directory listings from host into guest with vagrant. A listing in the guest will be missing entries, but you can CD into the missing folders. This causes sites not to work as the application in the guest can’t see files it requires. The only way to resolve it to date has been backup and reinstall with without apfs.


Having written a backup app several years ago, I struggled to gain foothold in the market. I sold the app off. After a few years of wanting more from my backup, I started a new backup app. I utilize a number of commands under the hood but it is an ncurses interface only right now (supports drag and drop still into a terminal window).

I'd guess that if I wanted to put it out for sale I'd need to add a GUI on top that.

Can anyone shed some light on why backup apps are struggling with preserving metadata?


Ah yeah okay tx.

I started to be disappointed as soon as i read that he is using rsync.

I was expecting someone who writes something like rsync and not a gui for rsync.


thanks - this is a good heads up. I asked my team to back up anything not checked into our git repo to some disk not controlled by OSX, and btw don't rely on your time machine, who knows if some bug in the new FS might affect that also. Better safe than sorry. I finished my note with a suggestion to just wait a few months before upgrading to 10.13


You can also keep your Time Machine disk disconnected during the upgrade so it won’t be touched by the installer. In my opinion not a bad idea anyway.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: