Hacker News new | past | comments | ask | show | jobs | submit login
Rclone – Sync files and directories to many cloud storage providers (github.com/rclone)
448 points by peter_d_sherman on April 6, 2020 | hide | past | favorite | 100 comments



I'm an occasional contributor for Rclone. Rclone holds a special place for me as the first open source project that I got seriously involved in and made me far more comfortable with working in open source.

Rclone is now very prevalent in my infrastructure. Almost all my websites are updated by a CI job that builds the website from a repo and pushes it up to the hosting server. There's an encrypted Rclone config in the repo and the password for it is a really long randomly generated string that gets saved in the CI as a secret. Rclone with Restic is how most of my servers get backed up, Rclone is how I access my Nextcloud and Google Drive, I have a containerized s3 compatible storage system that actually stores it's data on a Rclone remote (I hope `serve s3` gets implemented soon this setup can be simpler [1]), and much more.

I'm using it so much that I'm running up against the Google Drive API limit even though I'm using my own key.

* [1] https://github.com/rclone/rclone/issues/3382


You may benefit from rotating service accounts to sidestep the API limits. I haven't implemented this on my own setup yet, but I know that it is scriptable. Implementation differences prevent one overarching tutorial, but I left some breadcrumbs for you below.

https://github.com/l3uddz/cloudplow

https://github.com/Rhilip/AutoRclone

https://hub.docker.com/r/hotio/rclone


Is it better to integrate with the go-api or to write files to a local folder and run rclone on that to upload files?

I'm planning to upload files from my app and let an rclone cronjob occasionally sync between my backends.


I did not know that Restic was built on top of Rclone, but of course that makes a lot of sense. Thanks for contributing to a project that has made my life much easier!


rclone is just one of the backends you can use for restic. personally i find it far more performant to use a local disk for your restic repo (and keep the desired number of snapshots, etc there) and then use rclone to clone that repo to a cloud provider. personally i used backblaze's b2 api as it is generally cheaper.

restic is somewhat sensitive to the latency of the backend it uses.


I do the exact same thing.


Here is what you can do with 'rclone' and immutable ZFS snapshots - second screenshot:

http://rsync.net/products/universal.html

The workflow here is a little different than at Amazon or Backblaze, as we actually built the rclone binary into our environment.

So you can run 'rclone' without actually having it installed yourself:

    ssh user@rsync.net rclone s3:/some/bucket /rsync.net/dir
... and since ZFS snapshots at rsync.net are immutable (read-only) your account accumulates point-in-time versions of your data that are immune to bad actors like ransomware or Mallory in her various forms.

You can also, if you choose, do nothing but "dumb syncs" with rclone since retention can be handled with an arbitrary day/week/month/quarter/year snapshot schedule.


Wow this is a true OG account. Good shit btw, keep up the good work!


Rclone also has great encryption feature¹ which allows you to add a crypto layer to all of the cloud storage options in order to relatively easily hide all file names and contents. With it's cross platform capabilities it means you can have a unified secure storage across operating systems and secure it from the underlying storage provider as well as over the wire.

¹ https://rclone.org/crypt


This is a great feature. The one drawback that I've been able to come up with it is that the password can't be changed after the fact, meaning that I can't rotate credentials without creating a new remote and copying all of the data over. Does anyone know of a workaround for this?


Unfortunately not on the user level, but on the software level, it should generate a random 128-bit (or whatever) key, store that on the server and encrypt it with your KDF-strengthened password.

That way, you can change the password by just re-encrypting that file, though you can't do much in case the attacker got that decrypted key. You'd have to re-upload all the data for that.


It looks a bit more metadata-leaky than cryfs[1].

[1] https://www.cryfs.org/


Also gocryptfs, which is a tiny bit more leaky but apparently much more performant and a bit more featureful

https://nuetzlich.net/gocryptfs/comparison/


I used to use rclone to keep my files in sync with amazon (1TB) clouddrive (not S3, although it uses S3 infra). But amazon decided to pull support for rclone and all third party access to clouddrive (except odrive) - they revoked their API keys and decided not to hand out any more by shutting down the developer program. The result is that the only way to move files into/out of amazon clouddrive is to either use odrive or use their web interface (which really is a pain). Am now moving all of my 350+ GB of data out from amazon into backblaze (which supports rclone).

So while rclone "supports" amazon clouddrive technically, you can only use it if you already have your own API key that is not revoked.


If they're listing clouddrive as a supported backup solution that seems sort of suspect because it was never intended to be used by backup programs such as this to store TB's of files and they started cracking down on that long ago I believe. If you're wanting to use AWS for backups they want you to be using the "real" s3 services.


That's long-solved, since they kicked everyone with lots of data off and switched it from unlimited to 1TB. At this point there's no good reason to restrict people from using something like rclone.

If people switched to the 'proper' backup service they offer on AWS, that would be glacier, and they would get significantly less money!


You probably already know this but you'll probably be very happy with Backblaze B2.


I love rclone but a warning: it's real chatty with default settings with S3 -- I have a backup script that was on its way to a $60/month S3 bill that was 99% driven by number of requests not storage space or egress.


S3 is also super slow when you have lots of small files as many developers do. Google Drive performs better and is cheaper.


Google drive doesn't do versioning (at least, not for the 180 day period I want), which is why I rejected it as a backend.

Regardless, I haven't had S3 perf be a problem yet for me.


I love Rclone and have been using it for years (even recommended it to some customers for data migration purposes when I worked at Dropbox). The simplicity is what I appreciate most and the fact you can just run it every now and then to sync any new or updated files to your cloud storage of choice is what I love. Mind you I could never get the upload throughput as high as the Dropbox client itself (but that might not be RClone's fault).


> even recommended it to some customers for data migration purposes when I worked at Dropbox

Hah, I love this. I was having Dropbox sync issues a while back which support couldn't help with, and used rclone to prod and poke the state of my files and folders back into a state the DB client wouldn't choke on.


Restic has a nice feature set built on top of rclone: https://github.com/restic/restic - deduplication is particularly useful for systems with poor upload connection.


Restic is great. The only part I don't like is that you are forced to use a password. For local backups it's unnecessary and if you forget the password all your data is lost.


>For local backups it's unnecessary and if you forget the password all your data is lost.

As bad practice as it is, I'm guessing I'm not the only one with a super weak recycled password for instances like that.


I hope it is useful and interesting to point out that both restic and rclone (as well as borg and git-annex) are supported at rsync.net.

This isn't that surprising in the case of restic and git-annex, both of which transport over SFTP, but we have actually built into our environment the 'rclone' and 'borg' binaries such that you can execute them remotely:

    ssh user@rsync.net rclone s3:/some/bucket gdrive:/what/ever


And Duplicati too https://www.duplicati.com


Rclone is amazing. I’ve been relying on it for years. Seems like one of those best kept secrets in the industry.

It’s also a great advertisement for Go. A single binary installation. Fast. Runs on every platform.


I've been using a combination of rclone and borg backup to backup my Linux system to an S3 bucket on a daily cron schedule. I'm paying a little under $1/mo for storage, and it just works so I'm really happy with it so far. If you use a cheaper storage provider you could probably get it done for even less.


What space do you take ?


I use two borg backups (one local one remote).

I am very curious what you use rclone for (how it fits into your backup routine)


Correct me if i'm wrong, but for a remote Borgbackup you need to be able to create the full backup on your local drive, and only then you can upload it to your server (unless you are able to run something like an ssh-server on the server-side, which you can't with a lot of cloud storage services.)

With rclone, you can simply upload without having to create a full local 'snapshot' of the 'files-to-upload' first.


Ah sorry, I did not catch that you were talking about pure cloud storage solutions.

Yes, I have a remote VM where I can ssh and use it as a borg server.

The advantage I see in this vs a copy of the repo is that I have two independent backups. If one fails for some reason (defective disk for instance) then I do not copy a faulty backup further.


Why do you need borg backup? Just curious.


Borg does the backups. Rclone is for syncing data, which is not the same thing. For example, Borg does deduplication of the data. It also let’s you see different versions of a file. IMO, syncing can never replace backups.


Do you mean borg is doing incremental backups?


I do something similar. Borg does incremental backups and snapshots.

Makes a snapshot so you can see how things where at X moment. The only new data are changes made since.

Then rclone copies that new data over


Borg does but if you sync with rclone you might end up uploading full data for every upload depending on the change of your data. This might be very costly and that's why they suggest a active host for backup.


Borg's dedup is more than incremental backups. It dedups blocks/areas even within the first run. No significant run of data is repeated with a single repository.


Yes, AFAIK it does.


I've been looking for something like this for years. Never found it during my googling. My google-fu is weak :(

BTW how does the sync work? Does for example AWS expose a (free or supercheap) way of getting the SHA1 of a file on S3?


It uses size and/or modified time if the underlying backend doesn't have MD5s available. https://rclone.org/commands/rclone_sync/


But for a repository with millions of files, the GET lists required would be a substantial cost (and something that has to be done every day).

It would perhaps be useful with a mode where a repo of file hashes could be kept in parallell at the cloud provider, as just a file that could be downloaded every day, all the logic is done locally and then the files uploaded that differs, then the file hash repo is updated.


If you want to keep the costs down with S3 you use --checksum or --size-only to avoid reading the metadata with HEAD requests.

You can also use --fast-list to use less transactions at the cost of memory (basically just GETs to read 1000 objects at a time).

You can also do top up syncs using `rclone copy --max-age 24h --no-traverse` for example which won't do listings then do a full `rclone sync` once a week (say) which will delete stuff on the remote which has been deleted locally.

There is also the cache backend which does store metadata locally.


Hmm, ok seems like you could end up with something practical! Unfortunately any misstep will give you a $100/month extra bill. It was stuff like this I was hoping would be solved already without extra hacks :)


I use rclone to do account backups to B2 on my Virtualmin based web server that only supports S3 natively. I run a backup, rclone it to B2, then remove the local copy. Has been working wonderfully for quite a while now.

One of the wonderful things about rclone is that it's the right tool for so many jobs. Much like grep, sed, awk, etc, it is as simple or as complicated as you want it to be.


They have recently merged multiwrite support[1] for their Union[2] backend, which is a big news for me.

Even if you're not using any cloud storages, rclone works with local filesystem, WebDAV, FTP, SFTP, Nextcloud and Minio. Rclone is the only free-software solution which implements some features of MergerFS for Windows to my knowledge.

Amazing tool.

[1] https://github.com/rclone/rclone/pull/3782

[2] https://rclone.org/union/


Excerpt(s):

"Rclone ("rsync for cloud storage") is a command line program to sync files and directories to and from different cloud storage providers."

The following cloud (and other) storage providers are supported:

1Fichier

Amazon Drive

Amazon S3

Backblaze B2

Box

Citrix ShareFile

Dropbox

FTP

Google Cloud Storage

Google Drive

Google Photos

HTTP

Hubic

Jottacloud

Koofr

Mail.ru

Mega

Memory

Microsoft Azure Blob Storage

Microsoft OneDrive

OpenDrive

Openstack Swift

pCloud

premiumize.me

put.io

QingStor

SFTP

SugarSync

WebDAV

Yandex Disk

The local filesystem

Program is Open Source Go aka Golang

Related:

https://rclone.org/

https://rclone.org/overview/


I use rclone daily to sync Microsoft Teams' (aka Sharepoint) files from that horribly slow web-based UI to my local filesystem, browsing crazy fast with fzf.

A good document indexer should help me further, if anyone has suggestions...


Check out "Everything" https://www.voidtools.com/ It's blazing fast, and has tons of great features.


I mostly prefer Open-Source, Linux-compatible stuff.


Are you talking about searching file contents?

Might check out something like ripgrep, but that's for searching code and other text-like files. Wont help you out with MS Word docs or anything like that.


I used to use google desktop search (!!!) back in the days, it even had a linux client. I'm aware of tracker (gnome built in) etc, I just didn't do my research yet :-)

rg I use hourly for code-related navigation & searches tho. Good stuff


I have a cron job to backup my plex to backblaze.

My primary concern is that it works so well that I forget about it. This is often the case with many automations.

That is why I write documentation for myself.


I see I'm not alone!


This looks interesting, especially since it supports encryption, which i consider mandatory.

I'm wondering what cloud provider² that works with rclone offers the most storage per dollar? Caveat: I only need 100GB.

--

² ideally in Europe


Depending on how much storage you need, and whether you need instant or delayed access to your files, there's a couple of ideas like

- Scaleway's new AWS S3 Glacier-like storage (75GB free, in Paris)

- OVH's regular object storage vs. "cloud archive" (multiple EU locations)

- Wasabi (S3 compatible, but prepays for 1TB) (in Amsterdam)

The "cold" storage options are obviously cheaper, but have bigger problems in my experience of playing nice with backup and sync solutions like rclone.


Google Gsuite Business Account. $10/month/user. Technically 1TB of gdrive if you have under 5 users, but I had 30TB when I only had 4. It's truely unlimited if you have 5 users.


b2 and Scaleway are quite cheap (Wasabi would have been cheaper but that's >= 1TB). Check this post put [1] - not mine.

I personally use b2 with restic.

[1] https://medium.com/@simon_80033/how-cheap-can-cloud-storage-...


Backblaze, chose the central Europe server located in NL. 100GB will probably be around 0.5 USD/mo. You read that right.


Thanks, i'll check them out.


For only 100GB, DigitalOcean Spaces has 250GB for $5 per month AFAIK. Includes some bandwidth also. It's S3-compatible and I can vouch that it does work with rclone as I use the two together several times per week.


It will depend on what you're using it for. I use Azure Archive storage for last resort backups. Will be reasonably expensive if I ever need to restore the data, but until then it's £0.00135 per GB per month.


I use rclone with backblaze b2, very happy with both.


most likely backblaze B2.

Edit: OVH ?


I love Rclone, I use it to sync my static website to Neocities (which I also love) after building it on Gitlab CI:

https://gitlab.com/stavros/neocities-gitlab-ci-demo


i love rclone. my only gripe is that the "sync" command doesn't really seem to act like a sync - it overwrites the target directory with the source directory, there is no synchronization between the two. It was just unexpected behavior the first time I did it.

Other than that, it awesome how it works with so many data storage providers. If it helps anyone, the last time I looked at pricing (a couple months ago) IBM had the cheapest per GB cloud storage of all the providers. I already have 1TB of Google Cloud for $2/mo, so I'm using that to keep my life simple, however, it's slow as far as data stores go.


Weird.. I use rclone sync to move files between OS X in my room to Ubuntu in the garage and both of those to Google Drive. It has always just moved new files or changed files.


The sync appears to be unidirectional only

>Can rclone do bi-directional sync?

>No, not at present. rclone only does uni-directional sync from A -> B. It may do in the future though since it has all the primitives - it just requires writing the algorithm to do it

The difference between "copy" and "sync" is also a bit subtle:

> Copy files from source to dest, skipping already copied

> Make source and dest identical, modifying destination only.

That is, while "sync" may delete files on the destination so that it mirrors the source, copy will never do so.

I feel a better name for "sync" would have been "clone", conveying both the unidirectionality and the fact that it will make the destination like source.


I use rclone on my NAS with OpenDrive syncing with a couple of Cron tasks to keep my files backed up. They have an unlimited plan for $10/mo, which is the only cloud provider I could find with such a plan...

You're technically not supposed to do this with a NAS but I only have about ~40GB of files so far. If/when I get into the "Many TB" territory, I'll figure something else out..


The sync comparison operation might need to happen on something different than the default, that could be why it's not syncing. Like you might need to use time instead of md5


Hm, you might want to look into the SYNC command line argument:

https://rclone.org/commands/rclone_sync/


in addition to the other solutions folks are listing, duplicity also supports using rclone as a backend and in my experience works well.

If you're not familiar with duplicity, it uses GPG to encrypt tarballs of your backup data locally and keeps an index of each tarball separately (also encrypted) which it caches. This reduces the provider API calls to just the encrypted bundles (payload, indexes), the software then works on the indexes locally to do comparisons of what it needs to back up and makes new bundles to upload. (restores as well, it's searching the locally decrypted index caches, not making API calls etc.)


Last week I was attempting to sync a directory from a public S3 bucket using rclone (which is an awesome tool). I was surprised that apparently there's no way to do this without using AWS credentials (even though it's public). We ended up moving that dataset to Google drive, but even there you have to go through an oauth flow, even for public data.

Is anyone aware of a way to sync a remote directory to a local filesystem without authenticating, either with S3/drive or another cloud storage provider (and using rclone or another CLI tool)?


just installed rclone yesterday to push a lot of files to OVH cloud object store. Bonus points to OVH for providing the option to export rclone config from their UI.


There's a similar-ish project called Cryptomator, which encrypts files before storing them at cloud providers. Alas it's only open-source on desktop but not Android.

Web search shows that a bunch of Android clients exist for Rclone: the project's wiki points to ‘RCX’. will have to try it and see.

I dream occasionally of an encrypted file-level-Raid-5 made from free storage that the services have already bestowed on me.


There's actually an issue[1] open for implementing Cryptomator encryption support in Rclone.

I use Rclone on my Android device through Termux[2] which works pretty well. Termux includes Rclone in its repo[3]. I just set it up to `serve webdav` then access it through my file browser app which includes WebDAV support.

* [1] https://github.com/rclone/rclone/issues/1647

* [2] https://github.com/termux

* [3] https://github.com/termux/termux-packages/tree/master/packag...


RCX developer here.

I've thought about including Cryptomator support in the app by reusing the java libaries, but licensing issues make that hard. Support in rclone itself would be really great.

BTW: RCX exposes serve as well, and will get Storage Access Framework (SAF) support in the near future. When it works, SAF seems like magic.[1]

* [1] https://imgur.com/a/cppw03f


RCX dev here.

I also have this RAID idea, because the free offers are a marketing tool, and providers have no problem shutting off rclone access even for paying customers (happened with Amazon Cloud Drive, Yandex Disk).

There is currently a beta of new remote type "multiwrite union" (RAID-1 style). But for RAID-5, storage is just too cheap - some providers are as low as $5/TB-month.


Well, if I can have several different mirror-bunches set up, that would probably do the job nicely while being easier to implement =)


Re Android: rclone is in termux's repos.


Recently I decided to resurrect a very old Macbook White (2008) by install FreeBSD in it.

All of my workstation (laptop/desktop) have dropbox installed and yet dropbox is not available for FreeBSD.

This is where rclone come to the rescue. There might be other software that can take the role of dropbox client but rclone gives me the flexibility of switching to other provider if I decided to do so in the future.

Bless the author of rclone.


Also, rclone seems to have a pretty decent forum and user community (important for tech issues, if/when you ever have them) at:

https://forum.rclone.org/


Uncanny, I was looking for something exactly like this yesterday to upload screenshots automatically to Google Drive. Ended up using Dropbox's screenshot save feature but I'll switch to rclone instead.


Rclone saves my work life to upload many GBs to sharepoint. It resolves many sharepoint typical uploading issues.


RClone is very much appreciated tool in my collection. Thank you people for this very useful and well made tool


I would love iCloud support for this, but unfortunately I don’t think Apple will ever budge :(


Have you tried syncing to/from the iCloud Drive directory directly? (Backup your data first!)

    ~/Library/Mobile Documents/com~apple~CloudDocs/


Any opinions on Rclone vs Borg?

Seems like both support deduplication, which I thought was the advantage of Borg


Rclone is mainly for keeping files in sync/transferring files between multiple storage providers, for example copying a file to GDrive/Dropbox when such file is added to S3 or locally. It supports multiple storage providers.

Borg is for backups. It doesn't support any storage provider.

One can use both of them jointly, creating borg backups and saving them on GDrive for example.


You can use rclone sync with `--backup-dir` which is a flag stolen straight from rsync so you can keep old versions of your data.

You do something like this

    rclone sync /path/to/source remote:destination/current --backup-dir remote:destination/$(date -I)
Which will give you a complete current backup and sparse historical backups.

So for historical backups you can have deduplication, but rclone doesn't support deduplication within the sync though, so if you have two identical files within a directory, rclone will upload them both.

I don't know Borg very well but you can use it to back up to an rclone mount. I did look into making a borg server for rclone so it could speak borg protocol directly over ssh. It wouldn't be too hard, but the protocol isn't documented so it would mean reverse engineering the python code.


Is there a design document for this? I am curious about how this handles synchronization.


Does it support bi-directional sync yet so that I serve as a unison replacement?


It does support a writable FUSE Mount.


Erm, please forgive my ignorance but how would this help?


really cool project. But I still wanna find a way to sync with icloud. I need to pay for more space, so I can backup my iPhone, but now is most consisting of empty space, since isa pain to sync from linux.


They have an issue for it, https://github.com/rclone/rclone/issues/1778. But looks like the ball is in Apple's court to provide a publicly-consumable API that's documented.


Yeah... for a company that want to be a service company in the future, apple sure is still a pain in the ass to deal outside the ecosystem.


I’m in the same boat. My hacky workaround is to use syncthing to keep the iCloud directory syncd up with non-Apple devices.


deployment using rclone is really convenient, releasing packages could be synced into a bucket, and then from the bucket into servers.

rclone also works smoothly with CI servers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: