Hacker News new | past | comments | ask | show | jobs | submit login
Rclone syncs your files to cloud storage (rclone.org)
458 points by thunderbong 9 months ago | hide | past | favorite | 163 comments



Love rclone. IIRC for years the donation page said something along the lines of "rclone is a labor of love and I don't need your money, but if you'd like to help me buy my wife some flowers that would be greatly appreciated."

Just seems like a cool dude. Glad to see he's able to do it full time now, and hope it's still as fulfilling.

EDIT: Thanks to archive.org I found it[0]. Even more wholesome than I remembered:

> Rclone is a pure open source for love-not-money project. However I’ve had requests for a donation page and coding rclone does take me away from something else I love - my wonderful wife.

> So if you would like to send a donation, I will use it to buy flowers (and other pretty things) for her which will make her very happy.

[0]: https://web.archive.org/web/20180624105005/https://rclone.or...


Nick is great i had a chance to meet him and he is indeed a great man!


Rclone can also mount cloud storage to local disk, especially nice from kubernetes. Write/read speed isn't the fastest when using as a drive with lots of files in the same folder, but a quick and easy way to utilize cloud storage for projects

It can also e2e encrypt locations, so everything you put into the mounted drive is getting written encrypted to Dropbox folder /foo, for example. Nice, because Dropbox and other providers like s3 don't have native e2e support yet

All in all, Rclone is great! One of those tools that is good to have on hand, and solves so many usecases


It's also a trivial way to set up an ad-hoc ftp server which serves either local or cloud storage. e.g. my Pi4 runs an rclone ftp server which exposes my dropbox to the rest of my intranet, and a separate ftp server which my networked printer can save scans to. The same machine uses rclone to run automated backups of cloud storage for my household and a close friend. rclone is a godsend for my intranet.


And then all you need to do is use a trivial curlftpfs to have a synchronized folder available. Dropbox is not needed anymore ?


Oh wow I didn't know about this! Good tip


Do you mind explaining why it's so trivial versus setting up traditional ftp? I'm missing something.

Thank you


> Do you mind explaining why it's so trivial versus setting up traditional ftp?

    nohup /usr/bin/rclone serve ftp --addr MYIP:2121 $PWD &>/dev/null &
No configuration needed (beyond rclone's per-cloud-storage-account config (noting that serving local dirs this way does not require any cloud storage config)) and some variation of that can be added to crontab like:

    @reboot /usr/bin/sleep 30 && ...the above command...
Noting that $PWD can be a cloud drive identifier (part of the rclone config) so it can proxy a remote cloud service this same way. So, for example:

    rclone serve ftp --addr MYIP:2121 mydropbox:
assuming "mydropbox" is the locally-configured name for your rclone dropbox connection, that will serve your whole dropbox.


Just write a systemd unit. These commands are not any easier to support and are far worse from the purely technical point of view. You'll get:

- startup only when the network is up

- proper logging

- automatic restarts on failure

- optional protection for your ssh keys and other data if there's a breach (refer to `systemd-analyze security`)

Run:

  $ systemctl --user edit --full --force rclone-ftp.service
this opens a text editor; paste these lines:

  [Unit]
  After=network-online.target
  Wants=network-online.target

  [Install]
  WantedBy=default.target

  [Service]
  ExecStart=/usr/bin/rclone --your-flags /directory
and then enable and start the service:

  $ systemctl --user enable --now rclone-ftp


Seriously yes. Crontab isn't meant to keep your services up. We have a proper service manager now, out with the hacks.


People go out of their way to build their own crappy version of systemd.

systemd is far from perfect, and Poettering is radically anti-user. But it's the best we got and it serves us well


> Poettering is radically anti-user

What does that mean?


One of the authors of systemd


I know who he is but don't understand how he's supposed to be "anti-user".


Can traditional FTP talk to Dropbox?


Assuming the Dropbox is synchronized somewhere to your file system, the FTP server could serve that directory. Although I guess not everyone synchronizes their Dropbox locally.


> Although I guess not everyone synchronizes their Dropbox locally.

And i've yet to see a pi-native dropbox client for doing so.

PS: i actually do sync dropbox to my main workstation but selectively do so. The lion's share of my dropbox is stuff i don't need locally so don't bother to sync it. The rclone approach gives me easy access to the whole dropbox, when needed, from anywhere in my intranet, and i can use my local file manager instead of the dropbox web interface.


It is indeed great for this, but you need to make sure your network is stable.

I use it on my desktop and laptop to mount Google drives. The problem on the laptop is that the OS sees the drive as local, and Rclone doesn't timeout on network errors. So if you are not connected to wifi and an application tries to read/write to the drive, it will hang forever. This results in most of the UI locking up under XFCE for example, if you have a Thunar window open.


There is in fact a default timeout of 5 minutes and you can change it: https://rclone.org/docs/#timeout-time

I shorten it to prevent lockups like you are describing.


Thanks, but unfortunately this doesn't work - for my issues at least. I have this (and conntimeout) set to 15 seconds, but it makes no difference. I tried those based on another user reporting the same issue here:

https://forum.rclone.org/t/how-to-get-rclone-mount-to-issue-...

The timeout param is listed as "If a transfer has started but then becomes idle for this long it is considered broken and disconnected". This seems to be only for file transfers in progress.

I traced it once, and Rclone gets a "temporary DNS failure" error once the network is down, but just keeps retrying.


Sounds like you have enough for a decent bug report


Similar issues with WebDAV mounts on macOS


> Rclone can also mount cloud storage to local disk

It's not immediately apparent what this means—does it use FUSE, 9p, a driver, or some other mechanism to convert FS calls into API calls?

EDIT: it's FUSE.


Has anyone used it successfully as a replacement for two-way sync apps (like Insync for Google Drive)?. Insync sucks, and Google Drive sucks, but for something I depend on every day I feel like I really need to have a local copy of the files and immediate sync between local and server, particularly when Internet access is spotty.


You might want to try Unison: https://github.com/bcpierce00/unison

I've been using it to great effect for over 10 years on a daily basis.


But absolutely 100% remember to block automatic updates because even minor-minor version updates change the protocol (and even different versions of the ocaml compiler with the same version of unison source can mismatch.)

This pain has always stopped me using Unison whenever I give it another go (and it's been like this since, what, 2005? with no sign of them stabilising the protocol over major versions.)


The recent updates have stabilised things a while lot with nice features like atomic updates.


> Insync sucks...

FWIW, i've been using Insync on Linux since it went online (because Google never released a Linux-native client). Aside from one massive screw-up on their part about 8 or 10 years ago (where they automatically converted all of my 100+ gdocs-format files to MS office and deleted the originals), i've not had any issues with them. (In that one particular case the deleted gdocs were all in the trash bin, so could be recovered. Nothing was lost, it was just a huge pain in the butt.)


> Aside from one massive screw-up on their part about 8 or 10 years ago (where they automatically converted all of my 100+ gdocs-format files to MS office and deleted the originals)

Why. Just why. How does that shit ever happen in a public release?


You probably want syncthing


I have bi-dir sync from/to two computers, one running Windows 10, the other Linux, using Google Drive as buffer. All data are encrypted client-side because I don't want Google to know my business. But the sync is every 30 min, not immediate. I have no fancy GUI over it, like synchthing etc. just plain CLI command triggered by cron and the task scheduled.


Seems like a market opportunity exists for this especially with Apple cutting external drive support for third party sync tools including DropBox.


You can still do this, but Dropbox can’t use the File Provider API for that yet, so the experience won’t be quite as integrated as it is with Dropbox for macOS on File Provider. See https://help.dropbox.com/installs/dropbox-for-macos-support for more.


Syncthing maybe?


When you say “e2e” encryption do you mean client-side encryption? Because S3 supports both client and server side encryption. (It doesn’t really need to do anything on the service side to support client-side encryption tbf)

For client side encryption they have a whole encrypted S3 client and everything. (https://docs.aws.amazon.com/amazon-s3-encryption-client/late...)


This seems to be an SDK or library, not a command line tool.


I first used it at work to sync a OneDrive folder from a shared drive due to different audiences. Very cool tool. The open source stuff I really love.


rclone ncdu is my favorite rclone trick https://rclone.org/commands/rclone_ncdu/.

Most cloud space providers don't show you how much space each folder and subfolder actually occupies. Enter rclone ncdu.


I love ncdu and its integration with rclone is super practical!


Wow, this is really nice!


Didn't know about this one; thanks!


I use rclone daily. It's replaced sshfs for some cases. Another is to push home server $Archive share to rsync.net. another is to pull photos into $Archive from my Google account, my wife's, moms, dads, and kids account. Also my 3 clound account for different $WORK into $Archive (and then to rsync.net).

This is top-tier tooling right here.


> It's replaced sshfs for some cases.

I'd been using sshfs for some years until I learned that rclone can mount remotes to the file system, and I've been using that happily since then.

https://rclone.org/commands/rclone_mount/

> at present SSHFS does not have any active, regular contributors, and there are a number of known issues

https://github.com/libfuse/sshfs#development-status


> another is to pull photos into $Archive from my Google account

I assume you are pulling from Google photos? If so, then I think the only way to get original quality is to use takeout?


This I'm not sure of. I also have a habit to use Takeout quarterly (but this blob is not merged). I think I should check these configs; I want the full fidelity.


Sadly, it’s not configurable. It’s just an inherent limitation of the API (1). Takeout is the best alternative, but for something more realtime you can use tools (2) that wrap the browser UI which also exports full quality.

(1) https://github.com/gilesknap/gphotos-sync#warning-google-api...

(2) https://github.com/JakeWharton/docker-gphotos-sync


Running headless chrome and scraping the photos via dev tools sounds like good way to get your account banned by some automation with no human support to help you :/

It's really stupid that the API doesn't support this. For now I'll stick to regular takeout archives. Syncthing directly from the phone might be a better option for regular backups.


rsync.net also offers slightly cheaper plans if you don't need all the features/support, just Borg archives: https://www.rsync.net/products/borg.html


Take a look at Restic for backups. Rclone and Restic play together really nicely.

https://www.bobek.cz/blog/2020/restic-rclone/


The main disadvantage with pure Restic is that you usually have to end up writing your own shell scripts for some configuration management because Restic itself has none of that.

Fortunately there is https://github.com/creativeprojects/resticprofile to solve that problem.


Resticprofile is fantastic, makes it much easier to set up


Also, consider making backups append-only as an extra precaution: https://ruderich.org/simon/notes/append-only-backups-with-re...


The restic team refuses to have unencrypted backups just "because security".

I hate such an approach when someone assumes that whatever happens their opinion is the best one in the world and everyone else is wrong.

I do not want to encrypt my backups because I know what I am doing and I have very, very good reasons for that.

Restic could allow a --do-not-encrypt switch and backup by default.

The arrogance of the devs is astonishing and this si why I will not use Restic and i regret it very much because this is a very good backup solution.

Try Borg.


Mind sharing the reasons?


Backup locally or on a NAS, don't worry about losing the keys and all of your backups.

I'm using duplicity because none of these can sync to a rsync server that I run on my NAS.


You listed the main reasons, thanks.

As for remote backups - I use ssh with Borg and it works fine. If this is a NAS you can probably enable ssh (if it is not enabled already).

BTW for my remote backups I do encrypt them but this is a choice the author of Borg left open.

There are other issues with Borg such the use of local timestamps (naive date format, no timezone) instead of a full ISO8601 string, and the lack of capacity to ask whether a backup is completed (which is a nightmare for monitoring) because the registry is locked during a backup and you cannot query it.


I use both.

Restic for various machines to hourly backup on a local server. Rclone to sync those backups to S3 daily.

Keeps the read/write ops lower, restores faster, and limits to a single machine with S3 access.



I moved away from Duplicacy to restic when restic finally implemented compression.

Duplicacy is slower and also only free for personal use.


really tough for me to decide to use restic or kopia. instead i use a way worse solution because i'm afraid i'll mess something up


Like with all contingency systems it's necessary to do a "fire drill" exercise from time to time. Pretend that you lost your data and attempt to recover a few files from backups. Pretend that your house burned down and you can't use any item from home. This will give you confidence that your system is working.


I use restic for everything Linux or windows except for Mac’s. There I use kopia. It just works. There is not much to mess up really. Just keep your password safe and be sure to test your backups.


I back up TBs with restic, never had issues. When I need a file, I mount the remote repository. The mount is fast with S3.

The rclone backend means I can backup anywhere.


Rclone has a large selection of storage backends [1] plus a crypt backend [2] for encrypting any of the storage backends.

[1]: https://rclone.org/overview/

[2]: https://rclone.org/crypt/


I appreciate how its home page proudly calls out that it always verifies checksums. I've been boggled in the past how many tools skip that step. Sure, it slows things down, but when you're syncing between cloud storage options, you're probably in a situation where you really, really want to verify your checksums.


I thought the “indistinguishable from magic” testimonial at the top was a bit arrogant, but am currently trying to find out if these extraordinary claims have extraordinary evidence ;)


Fun fact somebody reverse engineered the Proton Drive API by looking at their open source client and made a plug-in for RClone.

This is currently the only way to get Proton Drive on Linux.


Oh! I didn’t now there was support for Proton Drive in rclone. I’m on a Visionary plan since they re-opened that for a while so now I get the use the storage too. Thanks :)


Rclone is a magic tool, I've used it for many different use cases.

When I last checked it doesn't use the AWS SDK (or the Go version is limited). Anyway, it isn't able to use all settings in .aws/config.

But it is kind of understandable that it doesn't support all backend features because it's a multifunctional tool.

Also the documentation is full of warnings of unmaintained features (like caching) and experimental features. Which is a fair warning but they don't specifically tell you the limitations.


I'm definitely a Rclone fan, it's an invaluable tool if less technical folks give you lots of data in a consumer / business cloud, and you need to somehow put that data onto a server to do further processing.

It's also great for direct cloud-to-cloud transfers if you have lots of data and a crappy home connection. Put clone on a server with good networking, run it under tmux, and your computer doesn't even have be on for the thing to run.


> Put clone on a server with good networking, run it under tmux

tmux isn't strictly necessary:

    nohup /usr/bin/rclone ... &>/dev/null &
then log out and it'll keep running.


That way doesn't let you check on your progress, right? Screen / Tmux let you re-attach to the session at any time and see what's going on, if there were errors etc.


> That way doesn't let you check on your progress, right

It does if you redirect the output to a file other than /dev/null, but in my experience checking on the progress is irrelevant - it's done when it's done.


There is a built in web GUI (experimental), and I also found the RcloneBrowser project that looks helpful when a GUI is handy.

https://kapitainsky.github.io/RcloneBrowser/

https://github.com/rclone/rclone-webui-react


Last commit 3 years ago


What cloud storage provider officially supports this tool or the protocol used by it?

The instructions look mostly like a reverse engineering effort that can fail anytime for any reason. Of course you'll get some notification, but still seems a thing to avoid in principle.


They generally use official APIs, of which most backends have.

I’ve been using rclone for 3 years without issues. Across many different backends.


What if we looked at all of the cloud providers APIs, which may or may not be similar to one another, may be in constant flux, and may have undocumented quirks, and hid all that complexity inside of one black box that provided a single clear API. The guts of that black box might be kinda ugly, and might need at lot of constant updating, but for a lot of users (including me), that is more appealing dealing with all the raw complexity of the various cloud providers "offical" APIs.


Most providers support S3 compatible APIs which rclone can then use to talk to them. For example, S3, Cloudflare R2, Backblaze, ...


I've been using it with Vultr as well without a hitch.


We have rclone built into our environment:

  ssh user@rsync.net rclone blah blah ...
... so you don't even need to install rclone for some use-cases - like backing up an S3 bucket to your rsync.net account:

https://www.rsync.net/resources/howto/rclone.html


There is no rsync protocol, only whatever protocol exist already. Most providers will have their version of S3, so if a provider wants to implement a strict copy of the AWS api they will be compatible. They can also provide an SFTP API and rclone will work.


It's a wrapper around a bunch of API's. If that makes you nervous then wait until you hear about how the rest of the modern web is strung together. It's been working reliably for me for a decade.


rsync.net


> What cloud storage provider officially supports this tool or the protocol used by it?

Man I wish I lived in a universe where providers actually supported tooling. You're completely delusional tho.


Not delusional. The rsync.net supports low-level tools like scp, and also this. And their team is great for engineer level questions.


That's wonderful! Full endorsement of the support of rsync.net.

I sincerely doubt most tools will be supported by most providers (by usage), however, and I question the evaluation of tools by this metric. Support is generally a fairly bad metric by which to evaluate tooling—generally speaking, if you care which client is consuming an API someone has done something horrifically wrong.


Wot? Support is THE metric.


...for the client tooling, or for the service? The service is understandable but the tooling makes no sense whatsoever.


Restic (rclone)+ B2 + Systemd Has solved backups for my system. Mainly the home sir of my Linux desktop. Never looked at anything else.


Just browsing the list of supported providers and such makes me a bit overwhelmed. There are so many tools that are kinda similar but slightly different, that from "wow, rclone seems like a great tool" I quickly come to "what should I even use?" Rclone, restic, borg, git-annex… Is there a case to use several of them together? Are there, like, good comprehensive real-life case studies to see how people use all that stuff and organize vast amounts of data (which are kinda prerequisite for anything like that to be used in the first place)?


Rclone is the general purpose "swiss army knife". It can be used to read/write/sync/copy/move files between any supported service and your local device, and also mount remote storage as a local drive. Restic and Borg are for making backups only.


> It can be used to read/write/sync/copy/move

To help avoid getting anyone's hopes up: rclone does not do automatic two-way sync. In rclone parlance, "sync" means to either pull all files from a remote and make a local copy match that, or do the opposite: push a local dir to a remote and update the remote to match the local one. Note that "match" means "delete anything in the target which is not in the source."


There is rclone bisync.


New user to rclone and I am wondering if the setup I want can be done through rclone.

I want to have a local directory in my computer that bi-directionally syncs to both Google Drive and DropBox as well as to my Synology NAS. There will be two Google Drive accounts (one for me and one for my wife) and one Dropbox account. I also have two Linux workstations running Ubuntu 22.04 and two Macbooks where I want the local directory to exist.

Basically anytime we put anything in either the cloud drives or the NAS or the local directories in the respective computers, it should get synced to all the other destinations.

There could be other stuff in the cloud drives or the NAS that I would just have the tool ignore. The sync should just be for that one specific folder. I do not want to deal with separate local folders for each cloud storage solutions.

Is this feasible? I am okay to run cron tasks in any of the computers; preferably the Linux ones.


Unfortunately I have no first hand experience with bisync mode. Only use rclone for pushing backups to clouds where it shines.


> There is rclone bisync.

Oooh, nice. That's not _quite_ the same as real-time two-way sync, but i guess it's the next best thing.

<https://rclone.org/bisync/>


I use Borg to backup my stuff to a local disk repository, which I then synchronize -- encrypted -- to several cloud accounts, and that is done by rclone.


I use rclone and git-annex together (there's a git-annex special remote plugin for rclone). Rclone handles interfacing with the various cloud providers, while git-annex tracks where the files are.


I just discovered rclone last week. I have a nice multifunction color laser printer with a scanner, and I want to use it to scan to the cloud without having to use my computer, but it's one model before they introduced that feature. It only does FTP.

Turns out rclone can present an FTP server proxy that is backed by any storage you want. I put it in a little Docker container and now I don't have to spend $400 on a new scanner.


This is really a great tool. You can also link against librclone if you need to use it as part of a larger application or do something more complex with it than scripting.


I use rclone on vm instances to sync files across gdrive, google photos , one drive , s3 and gcp object storage .

Running migrations on the server side is faster and more reliable. I can monitor transforms and transfers in tmux and then run quality checks when it’s complete

And having a vm lets me filter and transform the data during the migration. Eg pruning files , pruning git repos, compressing images .

There’s a framework waiting to be made that’s like grub but for personal data warehouse grooming like this


I have used rclone for several years now and it works so good. I use it in place of rsync often too. It’s great for file moving on local file systems too.


I’m a very occasional rclone user; I use it for one-off uploads of many files to some cloud, or syncing from cloud to cloud. Sometimes this is over shaky connections which can lead to many failures, so I love that you can just keep issuing the same sync command and it will continue from the point the sync broke.


I gave up on the onedrive client and switched to rclone to access SharePoint at work. It's more manual but it always works! The onedrive client was great when it worked, but when it started misbehaving there was no real way to fix it besides reinstall and start over


I used `rclone` in one startup to add the storage services the biz people were using, to engineering's easy&inexpensive offline "catastrophic" backups.

I used the normal ways of accessing AWS, GitLab, etc., but `rclone` made it easy to access the less-technical services.


An item on my todo list reads "Setup a backup system using Borg". I didn't know about Rclone but it seems very good, so now I want to rename my todo list item as "Setup a backup system using Rclone". How would you compare these 2 solutions?


rclone is not a proper backup tool. It's like an rsync that integrates with all the clouds. You can kinda use it as one though. I had Borg in my todo for a long time too -- experimented with it and restic which are proper backup tools -- they are a little more involved than rclone (and scary, as they can get corrupted and you are essentially reliant on the software to get your files back). I found rclone much simpler to work with. As they always say, any backup is better than no backup!

The simplest thing you can probably do is use rclone to copy your most important files to a B2 bucket. Enable Object-lock on the B2 bucket to ensure no deletion just to be safe. You can then run rclone on server and from your devices with cron jobs to archive your important stuff. This is not a proper backup as I said, if you rename files or delete unwanted stuff it wont leave on the backup bucket but it's usable for stuff like photos and the like, anything you don't want to lose.

(I lied, simplest thing is actually probably just copying to an external hard drive, but I find having rclone cron jobs much more practical)


Thank you so much! It really helps. My data is on a Synology NAS. It seems B2 buckets support is built into the Hyper Backup Synology software (the one whose purpose it to perform backups), I'm not sure if I am going to chose that or Rclone. But I already found some resources (https://www.reddit.com/r/synology/comments/hsy29y/hyper_back...), I'll investigate that. Do you, by chance, also perform your backups from a Synology NAS?


I have been using rclone for over two years now. Typically, I run cron to copy any new files in the directories that are most important to me. The combination of rclone and B2 storage is quite effective, especially considering its ease of use and cost efficiency.


set it up a year ago. Running every day without an issue and not touching it for 365 days +


I have been using this for years with my cloud provider and this is a rock solid application. My only wish is that the logging could have a sweet spot in the middle where it is not verbose but does show the names of the files changed


I used Rclone at a major media company to move 4 PB of media files (so far) out of a legacy datacenter over a 200 GB link. I used 18 workers and we were able to saturate the link consistently. Good stuff.


i tried using it to encrypt my cloud backups, and it works great for desktop, even if it requires running commands.

my gripe with it is to be able to use and sync from my smartphone. at least on ios, there is no robust tool that allows it afaik. there is cryptcloudviewer (https://github.com/lithium0003/ccViewer) which has not been updated since 2020.

appreciate any suggestions or more details of your workflow for these scenarios.


I use this to replace rsync because it does some things significantly better.


up there with ffmpeg in terms of usefulness


I literally picked up rclone for the first time last weekend and have been running with it. Great tool!


rclone also has a "universal" file explorer. I haven't found any other such explorer.


I am done with the cloud as a backup. I use ZFS incremental replication of Snapshots automated with Sanoid and Syncoid. To my own off-site box. So glad I don't have to mess with the file level again. Syncoid in pull-mode does not even need to know ZFS encryption keys (zero trust!).


If you are only using those two tools, then you only have a system for replication (and snapshots), but not a backup system.

If there is a data corruption bug in ZFS, it will propagate to your remote and corrupt data there.

I hope you have something else in place besides those two tools.


Yes (although ZFS is pretty stable, it is always good to mention to not put all your eggs in a single basket).

My fallbacks are:

- an external drive that I connect once a year and just rsync-dump everything

- for important files, a separate box where I have borg/borgmatic [1] in deduplication mode installed; this is updated once in a while

Just curious: Do you have any reason to believe that such a data corruption bug is likely in ZFS? It seems like saying that ext4 could have a bug and you should also store stuff on NTFS, just in case (which I think does not make sense..).

[1]: https://github.com/borgmatic-collective/borgmatic


Good further comment on the subject [1].

[1]: https://www.reddit.com/r/zfs/comments/85aa7s/comment/dvw55u3...


It's funny that you link to a comment from 6 years ago. Just a month after that there was a pretty big bug in ZFS that corrupted data.

https://github.com/openzfs/zfs/issues/7401

Corresponding HN discussion at the time: https://news.ycombinator.com/item?id=16797644


Yes, I read that. It is underpinning the 3-2-1 rule, 3 backups, on 2 different mediums (where zfs can be one of the two), one off-site.

I think it makes sense and thank you for the sensible reminder.


I am being downvoted, maybe I should explain myself better:

Most filesystems from x years ago do not cope well with current trends towards increasing file numbers. There are robust filesystems that can deal with Petabytes, but most have a tough time with Googolplexian filenumbers. I speak of all the git directories, or venv folders for my 100+ projects that require all their unique dependencies (a single venv is usually 400 to 800k files), or the 5000+ npm packages that are needed to build a simple website. or the local GIS datastores, split over hundreds and thousands of individual files.

Yes, I may not need to back those up. But I want to keep my projects together, sorted in folders, and not split by file system or backup requirements. This means sooner or later I need something like rsync to back things up somewhere. However, rsync and colleagues will need to build a directory and file tree and compare hashes for individual files. This takes time. A usual rsync scan on my laptop (ssd) with 1.6 Million files takes about 5 Minutes.

With ZFS, this is history and this is the major benefit to me. With block suballocation [1] it has no problems with a high number of small files (see a list of all filesystems that support BS here [2]). And: I don't have to mess with the file level. I can create snapshots and they will transfer immediately, incrementally and replicate everything offsite, without me having to deal with the myriad requirements of Volumes, Filesystems and higher level software (etc.).

If I really need ext4 or xfs (e.g.), I create a ZFS volume and format it with any filesystem I want, with all features of ZFS still available (compression, encryption, incremental replication, deduplication (if you want it)).

Yes, perhaps this has nothing to do with the _cloud_ (e.g. rsync.net offers zfs snapshot storage). But the post was about rclone, which is what my reply was pointed to.

[1]: https://en.wikipedia.org/wiki/Block_suballocation

[2]: https://en.wikipedia.org/wiki/Comparison_of_file_systems


rclone is so cool! https://pgs.sh is using similar tech but focusing on a managed service specifically for static sites.


Rclone solves so many problems for me it's crazy.


A shame the features don't indicate it can encrypt at rest.


My Dad left us 30 TB of data when he passed. I trimmed it down to 2 TB and tried to use Google's desktop sync app to upload it to cloud. It ran on my work computer during COVID for 8 months straight before finishing.

When I tried to back up that and my other data to a hard drive, Google takeout consistently failed, downloading consistently failed. I went back and forth with support for months with no solution.

Finally I found rclone and was done in a couple days.


> Google takeout consistently failed, downloading consistently failed

I get extra angry about situations like this. I have paid Google many thousands of dollars over the years for their services. I just assume that one of the fundamentals of a company like that would be handling large files even if it’s for a fairly small percentage of customers. I get that their feature set is limited. I kind of get that they don’t provide much support.

But for the many compromises I make with a company that size, I feel like semi-competent engineering should be one of the few benefits I get besides low price and high availability.


In the past few years it has been a leech of brain power that's optimized itself into producing nothing of value except demos that get abandoned immediately. From what I read, they seem to have managers and executives with bad incentives and too much power. So it seems that it doesn't really matter how competent the engineer is, their work goes into numerous black holes in the end.


> We should also remember how a foolish and willful ignorance of the superpower of rewards caused Soviet communists to get their final result, as described by one employee: “They pretend to pay us, and we pretend to work.” Perhaps the most important rule in management is “Get the incentives right.”

-- Charlie Munger, Poor Charlie's Almanack ch. 11, The Psychology of Human Misjudgment (principle #1)


Why would you expect that? It is not profitable to put extra work into serving a small proportion of customers. It is also not profitable to put work into helping customers stop using your services.

It is something providers do to keep regulators off their back. They are not going to put money into making it work well.


You can't create a product or service where every single feature is profitable on its own. If a feature exists, I expect it to work.


If a feature does not increase profits why would you put money into developing and maintaining it?


A feature that makes it easier to switch will also make it less risky to sign up in the first place.

Giving away a feature that is a competitor's cash cow could weaken that competitor's stranglehold on customers you want to connect with or drive the competitor out of business.

At a particular point in time, increasing market share could be more important for a company than making a profit.

It could be culturally uncommon to charge for some essential feature (say bank accounts), but without that feature you wouldn't be able to upsell customers on other features (credit cards, mortgages).

Ad funded companies give away features and entire services for free in order to be more attractive for ad customers.

Of course, if a feature is bad for a company in every conceivable direct and indirect way, even in combination with other features, now and in the future, under any and all circumstances, it would not introduce that feature at all.

Introducing a completely broken feature is unlikely to make much sense. Introducing lots of low quality features that make the product feel flaky but "full featured" could make some sense unfortunately.


> A feature that makes it easier to switch will also make it less risky to sign up in the first place.

Yes, if buyers think that far. Consumers do not. SOme businesses may, but its not a major consideration because the people making the decision will probably have moved on by the time a swtich is needed.

> Ad funded companies give away features and entire services for free in order to be more attractive for ad customers.

Yes, but that means those services do make a profit.

The same applies to the banking example but they can make money off free accounts as well.

> Introducing a completely broken feature is unlikely to make much sense. Introducing lots of low quality features that make the product feel flaky but "full featured" could make some sense unfortunately.

The latter is similar to what I am suggesting here. Being able to export data will probably satisfy most customers who want to do so, even if it does not actually work well for everyone. It will also mollify regulators if it works for most people. If they can say "data export works well for 95%" of our customers regulators are likely to conclude that that is sufficient not to impede competition.


>Yes, if buyers think that far.

They absolutely do. It's hard not to, because migrating data is the very first thing you have to think about when switching services. It's also a prominent part of FAQs and service documentation.

>Yes, but that means those services do make a profit.

Of course. What I said is that not every single feature can be profitable on its own. Obviously it has to be beneficial in some indirect way.


Devil's advocate: egress prices are always extremely high, so if you use their service that doesn't transfer the cost over to you it means it has been factored in that you won't be downloading that much. Making obstacles to actually doing it is, for them, the cost-effective way, and if you want to have an actual full access you're supposed to use their cloud storage.

But that's only one possible reasoning.


Inability to download from the cloud is happening to me every with every provider. Proton from the web, OneDrive Cryptomator vault through Cyberduck. I’ll have to use rclone, I guess


Every time I see this pop up on HN, I upvote. I've been using rclone for years for my main backup [1] (now over 18 TB) from my home NAS to Amazon S3 Glacier-backed Deep Archive, and it is by far the easiest backup setup I've ever had (and cheap—like $16/month).

[1] https://github.com/geerlingguy/my-backup-plan


Egress costs are significant by the time you want to use your backup, around $1800 to download the 18TB, which is around 10 years of storage cost. If you assume 5 years of HDD data lifetime, you need to roughly triple that monthly cost, which is not too bad, but you can buy an enterprise 20TB drive per year for that cost. Personally I would only use AWS in a way that I would almost never have to pay for egress for the entire dataset except when my house gets burned, as a last-resort backup instead of a main one.


That's exactly what I'm using it for :)

I have a primary and backup NAS at home, and a separate archive copy updated weekly offsite locally. Then Glacier is my "there was a nuclear bomb and my entire city is completely gone" solution.

And in that case, assuming I had survived, I would be willing to pay most anything to get my data back.


Nice, that's a great point and a good use case. I normally try to stay away from AWS (when I have to care about the cost) but I think you've found a case where it makes sense!

I've already got backblaze (behind rclone) set up for my backups so adding a glacier archive would be quite easy. Thanks!


Wouldn't Backblaze be much cheaper for that?


Last time I priced it out, Backblaze was more expensive per GB — note that I'm using Glacier Deep Archive, which is something like an order of magnitude less expensive than plain Glacier-backed buckets.

It also incurs a delay of at least 6-12 hours before the first byte of retrieval occurs, when you need to restore. So there are tradeoffs for the price.


Oh, I see. I'm shamefully ignorant about everything concerning backups, so I just googled "Amazon S3 Glacier" and compared some numbers that pop up to some other providers.

> a delay of at least 6-12 hours before the first byte of retrieval occurs

Huh. I wonder how it actually works behind the curtains. Do hey actually use HDDs to store all of that, or maybe is there some physical work involved in these 12 hours to retrieve a tape-drive from the archive…


From what I remember, Deep Archive uses tape libraries, and retrieval requires a robot fetching tapes quite often, leading to that delay.


Pro tip is to create a Google Workspace org and pay the $18/mo per 5TB of Drive storage to also get no egress fees. With rclone, the process is fairly simple especially when using a service account to authenticate (so no oauth revocations or refreshing).


Yep, if costs are a concern (which for any personal backup plan they should be) then stay away from AWS. Backblaze is IMHO the best option


you might be interested in https://restic.net :)


The two work together well too - I prefer to let Restic back up to a separate local (magnetic) HDD for my repo, and then use rclone to sync the local repo to Backblaze B2. Compared to a direct cloud repo set up in Restic, it gives me much better restore times and lower costs for the common case where your local site is not completely destroyed (e.g. data corruption etc...), while still giving you the safety of a slower and more expensive off-site backup for when you really need it.


This is a great landing page IMHO. The first item is not some fancy graphic, but a short paragraph explaining exactly what it is. I had a vague idea about rclone already, but it took about 5 seconds of reading to get a good understanding about its capabilities and how it could help me. A much better sales pitch than having to fight through 'Get started' or Github links to find basic info.


So I see this type of headline happen a lot, and despite being a long time reader and short time commenter, is there a reason the author doesn't make a more specific headline? Rclone has existed forever and has been mentioned here over a thousand times (https://www.google.com/search?client=firefox-b-1-d&q=rclone+...)

Most of the time the poster links to a specific news item at least (which is also not okay without a specific headline) but sometimes the link just points to the home page.

Regardless, it has been mentioned before.

EDIT: Just to be clear, I'm not again an occasional mention of a project. Hacker news has allowed me to find some real gems, but most of those posts are new features added to the software, not just a generic link to the homepage.


I think it’s meant to generate upvotes on the post and also meant to encourage discussions.

Like you mentioned, some real gems are found, and for me are often found in the comments. So, these reposts are helpful.

It’s like how many posts were made for “What do you use for note taking?” Some gems are are in the comments.


Reposts are fine after a year or so and there hasn't been one that recently. Titles are usually the titles of the pages linked as is this one.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...


Some people might end up in the 10,000 (https://xkcd.com/1053/)


.


If it isn't newsworthy, why is it on Hacker News?

This is discussed in the site docs, it's fine to get news from HN but HN is not really about news.


Fair enough - as a user, that prompts me to ask myself, why am I using Hacker News for tech news when most of the content I see daily isn't actually news.

I guess I will just be more aggressive in hiding posts that I didn't want to waste my time reading.


because learning is FUNdamental!


[flagged]


Says that Apple doesn't provide a public web API for it.

There's a ticket covering everything you might ever want to know:

https://github.com/rclone/rclone/issues/1778

Seems like rclone would support it if Apple ever created an official way to do so.


I don't use rsync. Because it's for synchronization and I don't really trust any of my storage drives to hold a perfect copy of data to sync from and act as an ultimate source of truth for connected, downloading clients. I especially don't really trust flash memory storage devices.

The only source of truth that is really reliable is a password memorized in your brain. Because keyfiles that can decrypt data can be corrupted, that is to say, they rely on a secondary data integrity check. Like the information memorized in your brain.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: