ClearSkies – open-source file syncing without cloud

nl · on Feb 24, 2014

Works well for me. For those who want to try it, this is what I did.

Create a Dockerfile:

  # Ubuntu 12.04 + Python + Git
  FROM nlothian/python-git

  # Ruby
  RUN apt-get -y install libgnutls26 ruby1.9.1
  RUN apt-get -y install ruby1.9.1-dev

  RUN gem install rb-inotify ffi

  RUN git clone https://github.com/jewel/clearskies

Build:

  sudo docker build -t clearskies .

Run:

  sudo docker run -i -t clearskies /bin/bash

Then:

  mkdir /testdir
  echo 'testing' > afile
  cd /clearskies
  ./clearskies start
  ./clearskies share /testdir

That will print something out in the form:

  clearskies:SYNCXXXXXXXXXXXXXXXXXXXXXXXX

Note this, then start another clearskies docker container.

In that one:

  mkdir /testdir
  cd /clearskies
  ./clearskies start
  ./clearskies attach clearskies:SYNCXXXXXXXXXXXXXXXXX /sharedir

Wait a few seconds, and your file should appear.

nl · on Feb 25, 2014

Hmm.

There do seem to be problems with some firewall scenarios (or at least I presume that is what it is..)

When I setup a node at home and one on an Azure box I see them discover each other, but no files seem to get copied.

jewel · on Feb 25, 2014

If you'll open an issue on github and attach the logs from both peers, I can look at it.

maho · on Feb 24, 2014

I like that "untrusted peers" are defined in the protocol, but unfortunately it's only an optional addendum [1]. To me, untrusted peering is the most important feature and I hope that not being defined in the main protocol does not mean it will be step-mothered in the implementation.

BitTorrent Sync only supports untrusted peers via API [2], and the only other open-source BitTorrent Sync alternative that I am aware of [3] left it out completely.

[1] https://github.com/jewel/clearskies/blob/master/protocol/unt...

[2] http://www.bittorrent.com/intl/de/sync/developers/api

[3] https://github.com/calmh/syncthing/wiki

jewel · on Feb 24, 2014

At one point I had it as a required part of the protocol, but later decided that it added a lot of complexity so moved it to an extension.

I'm currently working on a minor reorganization of the protocol in the protocol_cleanup branch. I'll see if I can fold untrusted mode back into the core protocol.

atmosx · on Feb 24, 2014

What exactly is the function of untrusted peers? Speed up the sync?

jffry · on Feb 24, 2014

To use a friend's machine as an off-site secondary backup. They can store encrypted data, but cannot view or push changes to the data. I'm guessing.

toomuchtodo · on Feb 24, 2014

If its untrusted, how can it be a canonical reference?

scott_karana · on Feb 24, 2014

Cryptographic hashes and signatures, I presume?

jewel · on Feb 24, 2014

My original intent was that you could have it running on a linode or other virtual machine where you don't trust the hardware.

maho · on Feb 24, 2014

Internet connections can be quite asymmetrical. I have a 32 MBit/s downlink, but only 1 MBit/s up. That makes updating data on my tablet with data from home painfully slow unless I am at home. For me, a fast peer that I don't need to trust is pretty helpful.

But apart from speed and redundancy, I also hope for economy of scale. If there were a small market where several hosters offer peered hosts with X GB for $Y/month, it could drive costs down for everyone. Dropbox is asking for $0.10/GB/mon, which is about twice as high as it could be if the market were efficient.

nacs · on Feb 24, 2014

Likely also for data redundancy if you only have 1 computer (or if all your computers/peers are in one building the untrusted peers could be your 'offsite backup' of sorts).

rahimnathwani · on Feb 24, 2014

How do the peers find each other? The installation instructions suggest that I just need to run 'clearskies share ...' on one computer, and 'clearskies attach ...' on the second computer. How will they find each other (on a LAN or, especially, if they're each behind different NAT routers)?

I see that there is some 'tracker' code in the repo:https://github.com/jewel/clearskies/tree/master/tracker

Must I run that somewhere that both computers can access, and tell them its address?

e12e · on Feb 24, 2014

Looks like there are various modes of discovery, see under "Peer discovery":

https://github.com/jewel/clearskies/blob/master/protocol/cor...

I can't find any references to DHT in the code (but the protocol lists that as an extension).

For lan udp broadcast:

https://github.com/jewel/clearskies/blob/master/lib/broadcas...

For tracker client (apparently gets a list of tracker URIs from the config):

https://github.com/jewel/clearskies/blob/master/lib/tracker_...

https://github.com/jewel/clearskies/blob/master/lib/conf.rb

jewel · on Feb 24, 2014

There's a common tracker, currently running at clearskies.tuxng.com.

The plan is to add DHT support similar to the DHT used by BitTorrent. We'll seed the DHT using the tracker.

If you want you can also run your own tracker. You should also be able to add peers manually by IP address and port, but that ability is missing from the ruby client.

jedc · on Feb 24, 2014

Another way to do file syncing without a particular cloud provider is Camlistore: http://camlistore.org/

Brad Fitzpatrick is one of the creators (LiveJournal, memcache, etc.) and it's rapidly getting better and better. That said, it can still be a bit tricky to get everything set up.

MoosePlissken · on Feb 24, 2014

Camlistore looks really cool, thanks for posting it. I've been thinking something like this should exist for some time now.

skimmas · on Feb 24, 2014

Finally... :) I've been waiting for a project like this to appear for ages... thankyou OSS developer angels.

PeterSmit · on Feb 24, 2014

This is excellent. I have been waiting for an open source btsync clone.

tiedemann · on Feb 24, 2014

SparkleShare? https://github.com/hbons/SparkleShare

thejosh · on Feb 24, 2014

gitannex? :)

rquirk · on Feb 24, 2014

Have you been able to use it? It is less user friendly than the proprietary solutions. There's tons of documentation, but nothing that says "here's how to use it like bittorrent sync". In fact IIRC the docs specifically say somewhere "you cannot use this like drop box".

Just sharing stuff between 2 PCs was very difficult (or I couldn't figure it out) and the annex program sat at 100% CPU most of the time doing nothing. Being written in Haskell is a turn off too. If I have to fix something, I want C, python, etc, not this crazy write-only language :-)

icebraining · on Feb 24, 2014

Have you tried the Assistant and its web UI? It comes bundled with git-annex: http://git-annex.branchable.com/assistant/

rquirk · on Feb 24, 2014

I did. Maybe I should give it another shot.

I recall my problem was trying to understand how to sync files I already had in other directories. Things certainly were not sync'd automatically. In the walkthrough it says you need to git-add files and then git-commit them http://git-annex.branchable.com/walkthrough/#index3h2

The ~/annex/ directory ended up with symlinks to git objects and the files themselves are nowhere to be found. I didn't know where things were/weren't sync'd already. Nothing sync'd across and the assistant just said "all done" or something similar. At one point I remeber it just containing a bunch of broken symlinks. Good job I was just testing it out, imagine if it replaced my actual files with broken symlinks.

What all this boils down to is that git-annex is not as fool-proof as the proprietary solutions claim to be and something equally Free as g-a, but less complicated, would be great.

Touche · on Feb 24, 2014

I have. It kind-of worked but eventually corrupted my files. They all became test files with a long .git/ path in them.

The Jabber plugin doesn't work so it wasn't distributed like btsync. I had a central "server". What this mean is if computer A kicked off a sync, a node wouldn't get updated right away. The "server" doesn't automatically push to all of the clients.

It worked ok, I've tried just about ever sync solution out there. btsync is the simplest "just works" that I've found. The only problem I've found with btsync is that sometimes the mobile apps appear to be offline. But once they are woke up they'll start syncing.

btsync isn't a backup solution though, so I have bakthat backing up to Amazon Glacier.

aroch · on Feb 24, 2014

It doesn't do homedir syncing under OSX, making it all but useless for me.

rahimnathwani · on Feb 24, 2014

Dev discussion group here:　https://groups.google.com/forum/#!forum/clearskies-dev

beagle3 · on Feb 24, 2014

The JSON packets are limited to 16MB - if this is supposed to contain a manifest of a deep-directory-with-lots-of-files, that might not be enough. I regularly rsync (on a lan) trees of a million files and 8 levels deep. The manifest for such a configuration will not fit within 16MB.

I see there's an "rdiff manifest" extension, which is cool for syncing later changes - but the initial manifest will have to be transferred some other way.

jewel · on Feb 24, 2014

This is handled by the protocol; see https://github.com/jewel/clearskies/blob/master/protocol/cor....

As an aside, I am currently adding a more sophisticated manifest exchange in the "protocol_cleanup" branch that will remove the need to keep sending the entire manifest (other than on the first connection).

polskibus · on Feb 24, 2014

I use nas4free + samba/CIFS daily, works like a charm. On each client (mobile, PC) I have an application that performs regular backups. In terms of alternatives, there's also OwnCloud that can be deployed in-house.

How does ClearSkies compare to existing private cloud solutions?

jewel · on Feb 24, 2014

ClearSkies doesn't require a central server like OwnCloud, it's peer-to-peer.

slavko · on Feb 24, 2014

Incredible! The OP is using SQLite, particularly, Fossil, http://fossil-scm.org/. It upsets me that this this open source venture does not give credit where it is due. If I a wrong, please correct me.

jewel · on Feb 24, 2014

I'm confused. The protocol spec doesn't mention SQLite. The ruby proof-of-concept doesn't use SQLite. I've not seen fossil before that I can recollect (I am the author of the clearskies protocol spec).

jmspring · on Feb 24, 2014

Granted, I'm on mobile, but how does this differ from BTSync for two computers? There needs to be some agent proxying discovery if machines aren't on the same local network.

p4bl0 · on Feb 24, 2014

I think the goal is to have a free (as in freedom) BTSync.

jewel · on Feb 24, 2014

Peer discovery is the same as with btsync. It uses a central tracker, but you can also add peers manually. We're planning on adding DHT support.

tjaerv · on Feb 24, 2014

Potentially interesting, but my interest waned considerably when I saw it's GPLv3.

jewel · on Feb 24, 2014

We're changing to LGPLv3. See https://github.com/jewel/clearskies/blob/master/license-chan....

The C++ implementation is also LGPL.

alexchamberlain · on Feb 24, 2014

How did you reach this decision?

IMHO such licences actually hold free software back.

jewel · on Feb 24, 2014

You can read the entire thread here:

https://groups.google.com/forum/#!msg/clearskies-dev/sTlXzBO...

For the sync app itself GPL would have worked great, but we want to have an easy-to-integrate sync library. Hopefully this will reduce the number of apps that require a cloud service to be able to synchronize the user data between devices.

welly · on Feb 24, 2014

Really? Because of the license, you're not interested in this project?

I find that very, very bizarre. Can you explain your thinking?

levosmetalo · on Feb 24, 2014

Can _you_ explain why do _you_ think someone loosing interest in the project because of its license choice is bizarre?

What makes _your_ "I don't care about licensing, and neither should you" opinion not bizarre, and his "I don't care about projects with wrong license" bizarre?

I agree that parent comment didn't add much to the discussion, but that's not the reason to imply bizarreness of it.

welly · on Feb 24, 2014

> Can _you_ explain why do _you_ think someone loosing interest in the project because of its license choice is bizarre

If it's a project that interests or is useful to me, then personally that trumps the license. If I discover a project useful to my workflow then I'll find a way to work with it, despite the license.

> What makes _your_ "I don't care about licensing, and neither should you" opinion not bizarre

Because it's my opinion and I generally don't find my opinions bizarre, otherwise I wouldn't hold them.

And I didn't say "I don't care about licensing, and neither should you" or even insinuate that.

He's more than welcome to care more about a license over a usefulness of piece of software and I'll still find it odd.

euank · on Feb 24, 2014

Not him, but I can see a few good reasons.

GPL is rather limiting in how you can use the code. A large company (e.g. Apple) won't let something gpl be used heavily internally if they can help it since they won't be able to apply patches or modify it without releasing these changes ... this obligation adds significant legal burden and, furthermore, releasing the changes could reveal private details about the companies internals.

I don't think that the commentor's reason is a good one, but I can understand the viewpoint; GPLv3 is quite limiting for some uses.

chrj · on Feb 24, 2014

As long as it's only used and distributed internally, I don't believe the GPL has a problem with them modifying it without disclosing their changes. That's my understanding.

indeyets · on Feb 24, 2014

not really. they won't be able to publish product which incorporates GPL library without releasing this product under compatible license.

they can use and patch GPL product internally as long as they want.

wazoox · on Feb 24, 2014

Indeed, AGPL v3 would have been way better.

atmosx · on Feb 24, 2014

Why does the author wants to port the daemon from ruby to C++? Is it speed?

jewel · on Feb 24, 2014

Speed isn't actually too much of an issue with the ruby client, since all the CPU time is spent in the GnuTLS and Digest::SHA256 code, which is already written in C.

The problem is portability to android and iOS. We additionally want to make the core easy to embed in other applications.

beagle3 · on Feb 24, 2014

He mentions an android client. If you want it to be fast, or not drain your battery, you need Java or C++. If you want it to run on iOS or Windows Phone, C++ it is.

silasb · on Feb 24, 2014

To make it easier to port to Android and IOS is my guess.

sarreph · on Feb 24, 2014

I love the name for this. :)