Hacker News new | past | comments | ask | show | jobs | submit login
Distributing NixOS with IPFS (sourcediver.org)
241 points by robto on Jan 19, 2017 | hide | past | favorite | 75 comments



I've been following these github issues for a while; fetching sources from IPFS seems like a great step forward for resiliency in general, and quite a natural one for Nix considering things are already immutable. Using IPFS as a binary cache is nice, as it would lower the maintainers' burden and make out-of-tree experimentation easier, i.e. without damaging the integrity of nixpkgs and cache.nixos.org.

I hadn't even thought about using the FUSE integration of IPFS, but it makes a lot of sense. Nix is a lazy language, and the nixpkgs repository basically defines one big value: a set of name/value pairs for every package it contains (as well various libraries for e.g. working with Python packages, Haskell packages, etc.). The only difference between installed/uninstalled packages is whether anything's forced the contents to be evaluated yet.

Likewise, an IPFS FUSE mount conceptually contains the whole of IPFS. The only difference between downloaded/undownloaded files is whether anything's forced the contents to be evaluated yet.


This article doesn't mention the most significant fallout of the IPFS idea (imo), which is that of .nar deduplication, as detailed in the issue https://github.com/NixOS/nix/issues/859 (point 4).

Perhaps a nail in the coffin of one of Nix's biggest absurdities.


Very cool.

One benefit of schemes like this that people don't talk about much is that, by no longer downloading from an expected place, you're removing the possibility for a compromised developer or server operator to selectively serve up malware to a targeted user. Instead you're getting the file over bittorrent and checking its hash, and you could gossip with other bittorrent clients to confirm that everyone's trying to get the same hash.

Compare with the state of the art in most software updates, which is that you connect to some download server and it could serve signed malware to people on its target list and probably no-one would notice.

(Schemes that use some of these techniques to take out the single point of malware-insertion have been called "Binary Transparency" schemes, as an analog to Certificate Transparency.)


Guix has a very good complementary approach to this problem, guix challenge. Perhaps it's also implemented in Nix too already:

https://www.gnu.org/software/guix/manual/html_node/Invoking-...

Basically, since builds are reproducible, you can automatically build from source and see if the hash of the binary you built matches the one you are downloading.

Obviously, source can be still compromised. But that's probably something IPFS won't fix unless wherever you get sources from is also on IPFS.


`nix-build` can take `--check` to do a similar thing. However, not all packages are reproducible. We've been doing a bit of work on this: https://garbas.si/2016/reproducible-builds-summit-in-berlin.... and have begun checking reproducibility in our CI system: http://hydra.nixos.org/jobset/nixos/reproducibility


Just a tiny nitpick, if I may: most builds are reproducible, but not all. Out of ~5000 packages, ~600 are not yet reproducible.


Sorry, I don't understand. If the problem is a hypothetical situation where a compromised developer uploads malicious code, then how does IPFS relieve any pressure from that circumstance?

Individual IPFS nodes are certainly blindly trusting the developer's signature as a stamp of approval. Adding more nodes doesn't make that problem better. It makes it worse by providing a greater false sense of security.

In the case of a compromised server operator, as long as hosting company X is smaller than Amazon, it's always better to use Amazon's cloud service to mitigate the possibility of server operator tampering.


I think the point is that you can't serve a malicious version to a specific user, you must serve it to everyone, which makes it much easier to detect.


It makes targeted attacks more difficult because you have to compromise all of your users rather than offer a different download to one.


Agreed. I think it only makes sense to say it offers tamper protection if you have reproducible builds and are distributing the source code via (for example) IPFS. But even that is then questionable, because who's auditing the source code? Or the builds? Or the compiler?

Trust isn't really something you can algorithmically fabricate. At a certain point it always reduces to a tautology: "I trust this thing because I trust it." Distributed compiled code, because of its opacity and complexity, is an excellent example of exactly how hard it is to kick that bootstrapping tautology further down the road.

Distributing binaries via IPFS is functionally identical to distributing signed binaries from a central server, provided clients always check the signature. Now, that last bit isn't necessarily always true, but if your problem is "why aren't my clients checking their signatures", solving it with IPFS just doesn't make sense. It's like saying "This person isn't PGP signing their emails, so I'm going to download all of my emails using Bittorrent."


IPFS doesnt use bittorrent. Its bittorrent like, bitswap, with some "new protocol in town problems".


I wonder why IPFS hasn't been built on top of Bittorrent, so it could reuse some of the existing infrastructure.

For example, everybody hosting a file in IPFS could automatically become a BT seeder for that file. And searching on Pirate Bay for a certail file path / hash could make the file accessible to people who have a BT client but do not (yet) run IPFS on their machine.


Its my impression that bitswap was chosen/made, to have an easier time making "trading strategies" with other peers.

BitTorrent would for that purpose be harder to modify, and claim backwards-compatibility, so I think they went all-in, and bitswap - with pluggable "trading strategies", ie how to decide which peer to share with and how much - how to avoid free-leach and so on. There was a mention of combining ipfs with filecoin - bitswap would make that integration easier.


One could implement an IPFS-BitTorrent bridge. WebTorrent's hybrid peer [1] does something similar.

[1] https://github.com/feross/webtorrent


This kind of backwards compatibility works against network effects for the new protocol though.


Another way to solve this problem is to have binaries signed by a trusted developer/group. Not necessarily better or worse than using decentralization to solve the problem, just another good idea.


Yeah, that works as long as the trusted group aren't themselves coerced into signing targeted malware, or choose to.


That's really no different to the developers choosing to write malicious code into their own product. A little paranoia about the storage and transport of executable data is understandable but if you cannot even trust the company employees not to compromise their own software then you shouldn't be using their products in the first place.

Where your distrust of keys would have more merit is the storage and strength of said keys. Eg If they dont have a strong passphrase and stored on an NFS / CIFS share then one could argue that they're no more secure than a bespoke build script.


If we believed this argument, there would be no need for Certificate Transparency, which understands that CAs can create certs for any domain -- they've got the keys to do it -- but mistrusts their signatures by default and trusts them only to the extent that their signatures (update bundles, in the software analogy) and logged and seen by everyone at the same time.

I think we can do the same thing for software. Why not try?


I get what you're saying but my point is that if you're running executable software from a particular company then you have to trust their staff with regards to deliberate malicious intent. Ergo if you don't trust the software signers from deliberately (whether that be voluntarily of via coercion) signing malicious software then you equally cannot trust the developers from writing in malicious code first hand and thus that being signed either with or without the certificate holders knowledge. If one cannot even trust the developers not to inject malware into their own software then you might as well give up with that company.

Yes there are tools one can run to protect themselves against the aforementioned, eg application firewalls, sandboxing, network firewalls, etc. But at some point you have to trust that Microsoft Office / Firefox / your favourite Linux distro / whatever was built honestly from the outset as software - even with the source code available in the case of OSS - is far too complex to reliably vet before running in production.

The issues with storage and transport (IPFS, HTTPS, etc) is a different matter because they're to protect against external attacks rather than corruption within the company itself. This is where the issue of software signing might fall short. Not because of disreputable people within the business but more because of negligence (eg certificates not being stored securely so attackers can inject malware into the software and then sign it themselves)

So I'm not against criticisms regarding software signing; I just don't agree with your points regarding the motives of the signers. Simply put, if you cannot trust key people within a business to write and release software honestly (negligence aside), then you should not be installing nor running their software to be begin with.


I want to ditch the Nar format as soon as possible. IPFS's unixfs format is too rich however.

When will the IPFS people finish up https://github.com/ipld/cid so we can link whatever content addressable data we want?

I'd use git tree objects, despite SHA-1, because it's widely supported. Or do a format identical tree objects but with the IPFS's multihash and SHA-1 banned.

Point is, underlying protocol should be agnostic to hashing scheme, we should have a trait/type class like

  /// Node in try
  trait Payload {
    type Hash: HashingTrait;

    fn unpack(Payload) -> (Vec<u8>, Set<Hash>);
    fn pack(Vec<u8>, Set<Hash>) -> Payload;

    // Implement either and get the other for free!
    fn hash_packed(p: Payload) -> Hash { hash_unpacked(packed(p))
    fn hash_unpacked(p: (Vec<u8>, Set<Hash>)) -> Hash { hash_packed(packed(p)) }
  }

any `(Hash, Payload)` than can define a `(binary blob, Set<Hash>) -> Hash` and Payload function should work.


Hey! IPFS Dev here.

The cid stuff has been implemented and initial support for it is being landed in our 0.4.5 release (which will be soon, hopefully release candidate within a week).

With that and IPLD, you can craft arbitrary objects in JSON or CBOR (theres a 1 to 1 mapping, objects are stored as cbor) and work with them in ipfs. For example, i could make an object that looks like:

  {
    "Contents": {"/":"QmHashOfPkgContents"},
    "Compression": "bzip2",
    "NarSize": 12345,
    "References": {
      "foo": {"/": "QmHashOfFoo"},
      "bar": {"/": "QmHashOfBar"}
    },
    "Signature": "signature info, or a link to the signature",
  }
(please excuse my attempt at recreating a nar file in rough json).

This object could then be put into ipfs with:

  cat thing.json | ipfs dag put
And you would get an ipld object that you can move around in ipfs, and do fun things like:

  ipfs get <thatobjhash>/Contents
to download the package contents, or:

  ipfs get <thatobjhash>/References/foo
to get the referenced package (or open that hash/path in an ipfs gateway to browse the package graph for free in your browser :) )


IPLD does allow storing tons of data, but custom schemas allow restricting the data referenced in arbitrary ways.

IPLD, last I checked, supports relative paths (which can make certain cycles), and not every node child gets its own hash. This is too much flexibility for my purposes (Nix or otherwise).

Also, when interfacing with legacy systems like git repos, one needs to dereference a legacy hash without knowing what it points to, which is easiest done with custom schemas.

Now, granted, customs schemas aren't a super fine-grained solution as every node in the network that cares about the data needs to implement the schema, but they are useful tool for these reasons (and that downside doesn't apply to private networks).


Also see the multicodec table for codes of ethereum/bitcoin/zcash/stellar

https://github.com/multiformats/multicodec/blob/2725f3c5cd7b...


Ok, so it's good we can finally refer to other node types. But I worry about putting all that in a single namespace. The IPLD node types constitute different hashing strategies as I describe above, but stuff like media codecs are orthogonal to hashing strategies---media of various sorts given a hashing strategy will be treated as black-box binary data for the foreseeable future.

The big takeaway here is a really like the idea of IPFS, and want to be a full fan, but everywhere I look I see dubious interfaces. I see what already looks like legacy cruft, and they haven't even hit 1.0!


Is there a reason git is not on this table yet?


> Also, when interfacing with legacy systems like git repos, one needs to dereference a legacy hash without knowing what it points to, which is easiest done with custom schemas.

The CID (address format) in IPLD doesn't represent types of systems, but it represents types of data structures. E.g. in the case of git, it's not "git,$thehash", but instead "git-tree,$thehash" or "git-commit,$thehash".

That way you know which code you'll need to run once you have the object's payload, or you could have datastores that simply pull blocks out of a git repo.

Is this getting closer to what you mean?


Yeah, my OP was saying what's happened to CID. I guess it's been implemented without finishing off the spec :/.

While I'm not opposed to treating git that way, do note that git hashes are specifically constructed by prefixing the serialization of blobs, trees, and commits separately so that collisions are not likely.


It is incredible that just today I posted this on github: https://github.com/ipld/ipld/issues/16

Structs/Traits already exist in some form that is not well defined yet, what you are describing is a generalization of what is happening with Ethereum right now, we have eth-block being a IPLD "Format", which is basically a struct with some particular characteristic where the parser instead of being written in a IPFS VM language, it is written and executed as part of the daemon.

The idea that you have described above or a subset of it, it's part of the plan!

Please do participate in the issue that I linked you to!


@Ericson2314, yeah thnaks for bringing this up. As was mentioned elsewhere:

* CID is finished and live in go-ipfs@master and js-ipfs@master. We haven't announced it widely because go-ipfs@0.4.5 is still to land. (ooof)

* IPLD spec needs work, but work continues.

I wanted to add that:

* please contribute to CID to get it where you need it to be.

* i am personally very interested in defining IPLD data structures and their operations in a good language. This will probably be transpiled to the IPFS impl language, or compiled down to WebAssembly and run in a small WA VM (the web of datastructures)


I've felt for a while that a standard, widely-implemented, distributed content-addressable store is one of the biggest missing pieces of the modern internet. Glad to see any steps in that direction.

I'll know real progress has been made when my browser can resolve something like:

cas://sha256/2d66257e9d2cda5c08e850a947caacbc0177411f379f986dd9a8abc653ce5a8e


Completely agree! Another comment has mentioned beakerbrowser.com, and at IPFS we're about to get js-ipfs in the browser to interoperate with go-ipfs. The goal with js-ipfs and the firefox addon is (1) seamless support for fs:/ipfs/somehash URIs, and (2) to offer the ipfs APIs to webpages.

* https://github.com/ipfs/js-ipfs * https://addons.mozilla.org/en-US/firefox/addon/ipfs-gateway-...


Seems like a good place to ask, is there a guide for IPFS implementers? I know there is a bunch of specs, but it's a bit hard to understand the complexity of the thing and where to look in case I was looking to implement it, instead of say bittorrent-based content delivery.


We have bunch of interfaces + tests you can use and also a good place to start is over here: https://github.com/ipfs/specs/blob/master/overviews/implemen...

We want to make it easier though, and haven't really got there yet. But if you have questions, #ipfs on freenode is a good place where core devs and the community hang around usually.


Acknowledged -- the documentation around IPFS and libp2p is not exactly delighful yet :) A new community member as written up their notes on diving into the IPFS ecosystem here: https://github.com/ipfs/specs/issues/145


There's a Docker repo which you may find useful: https://github.com/ipfs/go-ipfs#docker-usage



Bittorrent's DHT is used that way now, it's probably the biggest public DHT deployment in existence, all a magnet link is at it's core is magnet:?xt=urn:btih:<infohash>, so just having an infohash of a torrent is enough for you to get its content.

Of course, there's also IPFS, Zeronet and Freenet which all address this exact issue in slightly different ways, all more web-targetted.


I thought Zeronet used IPFS underneath?


I believe it's an entirely separate network, not DHT based like IPFS though, but they've done a lot of cool work on the client side of things to allow user accounts and stuff.


Morphis is supposed to provide something like this: https://morph.is/


Client support is not "real progress" contrary to popular adoption views. Neither are standards, now I think about it.


Take a look at https://github.com/ipld/cid and the fs: handler and browser support mentioned in other comments.


That's exactly what WWW is, though. Your browser knows how to resolve a domain with DNS and fetch over HTTP file at a certain path.


It's the difference between "get whatever this is" and "get this, wherever it is".


WWW isn't content-addressed though. There can be varying content at the same address, and identical content at different addresses.


HTTP is centered around hosts being the authority over what content a certain URL maps to, while IPFS and other content-addressed systems don't have a notion of server/client, or central authority. The content itself is the authority, as its name is derived from the content itself.


Nice project! Guix had a GSoC student working on binary distribution using GNUnet's file sharing component a while back: https://gnu.org/s/guix/news/gsoc-update.html . That has not led (yet?) to production code, but there might be ideas worth sharing.


It's almost ridiculous how good the two fit together.

I had the feeling NixOS has a bit of a hard time get users and prove that it's a superior solution to ansible/docker/chef/etc. probably because of it's mediocre UX, haha.

But this would add another killer feature to it.


Very interesting development. It would be great to see NixOS as an early adopter for IPFS.

BTW, there is a small typo:

    IPFS is aims to create the distributed net.
It should be:

    IPFS aims to create the distributed net.


This is a great idea. A lot of businesses heavily rely on old versions of open source packages always being available. The one time someone deprecated an npm package, half of the nodejs stacks went with it.

edit: Didn't mean to hit reply. Sorry.


It doesn't really solve that use case though, does it?

Eg, IPFS isn't permanent hosting - it's purely hosting as long as there are seeds, like bittorrent. Hypothetically if a package is very old there may be no seeds for it anywhere. Someone (NIXOS/etc) will still have to pay for hosting.


It does provide a solution though. If someone is still hosting it, you can access it in the same way. Nobody can pull, independently, a specific version down. They could remove their copy and hope nobody else is also providing it, but if it was a big problem other people could quickly start providing it and nobody else would notice the difference.

Perhaps an equivalent thing for "someone will still have to pay for hosting" is that although that's true, anyone can put money into the pot to keep it going or bring it back, it's not reliant on the original creator to keep paying for it.


Just delete your comment and put it on toplevel again.


You can't delete comments that have been replied to.


You can, and the reply would remain. The tree then looks like this:

    comment1
       [deleted]
           comment3
However, you can delete comments only until a certain amount of time. If you wait too long, you can't delete anymore (and this is what happened here).


That's how it used to work, but I'm pretty sure it doesn't work like that anymore.


I wonder if I'm the only one who is annoyed by the continuing lack of transparency at HN, wasting my time and other people's time with guesswork like this. I just took the time to start an "Ask HN" entry on that topic:

"Ask HN: Where can I follow the changes of HN itself?"

https://news.ycombinator.com/item?id=13460588


While that "Ask HN" entry got quite a few upvotes, this comment received downvotes. Why is that? Could anyone explain what's wrong with this comment?


thanks. fixed it.


I'm really excited to see what the future holds for IPFS! However, hosting websites with custom domains is not quite feasible yet. Using IPFS' DNS (IPNS) means you have to keep the IPFS daemon running constantly, or else the files will be purged within an hour.


> However, hosting websites with custom domains is not quite feasible yet

Sure, point your domain to your IPFS hash and use dnslink, it's quite reliable already actually. That's how ipfs.io is hosted for example, and we haven't hit any issues so far.

> Using IPFS' DNS (IPNS) means you have to keep the IPFS daemon running constantly, or else the files will be purged within an hour

So, the files won't be purged, but the record you push out with IPNS won't be valid after 24 hours. You can solve this easily by using /ipfs/:hash instead of /ipns/:id and it wont disappear.


Yet again I failed to include important details... :p

Yeah, it certainly is possible to host static sites on IPFS -- I have been testing it on my site[0] just for kicks. However, since I really enjoy using my domain name, rather than "ipfs.io/ip{f|n}s/$hash", I'm reluctant to try adapting my site to IPFS. I am aware that this is an alpha product, though, and I can't stress enough how cool it is for files containing the same data to be given the same ID (hash). That way you don't have to run shasum ever again :D

[0]: https://ipfs.citrusui.me

(on a random note: what's up with /blog returning the IPFS blog, instead of my own content? did i misconfigure something?)


The HTTP gateway included in go-ipfs will try to use a Host header on requests for constructing an /ipns path. So e.g. http://ipfs.io/docs gets turned into /ipns/ipfs.io/docs -- this is actually how we host all webpages within the ipfs project.

edit: And the part which turns /ipns/ipfs.io into an /ipfs path is called dnslink: https://github.com/ipfs/go-dnslink -- it resolves to what's in TXT _dnslink.ipfs.io.

> (on a random note: what's up with /blog returning the IPFS blog, instead of my own content? did i misconfigure something?)

I'm so sorry, that's a bug I put into the nginx configuration -- will fix it!


This is a really great idea! Reminds me of other projects that are working on integrating IPFS with the Operating System: https://github.com/vtomole/IPOS


Good to see other people are inventing AppFS ( http://appfs.rkeene.org/ ) :-)


> Good to see other people are inventing AppFS ( http://appfs.rkeene.org/ ) :-)

I'll take the construction "other people are inventing <the thing i invented some time ago>" to mean that you think you came up with this first, or at least prior to Nix or IPFS. And thus ":-)" to be a bit sarcastic and unhappy, instead of genuinely happy.

Similar times:

* http://appfs.rkeene.org/web/timeline?c=78c60b0c9e7da1c9&unhi...

* https://gist.github.com/jbenet/8f000606f2009495c56177f6ca2c1...

* https://github.com/ipfs/ipfs/commit/8004db75262fcd29399d6c7f...

* https://github.com/jbenet/random-ideas/issues/19

* i think the Nix people probably came up with this way before either of us

* and i am willing to bet everything that at least 100 other people (maybe thousands) have came up with this exact same idea over a decade before any of us...

In fact, most of the best "original ideas" in the IPFS body of work were probably first discovered decades prior.

I repeatedly see people succumbing to sadness over multiple discovery. It shouldn't be sad, it should be a happy event, as it confirms our own thoughts and presents an opportunity for collaborations. :)

Further thoughts here: https://gist.github.com/jbenet/8f000606f2009495c56177f6ca2c1...


1. I was genuinely happy that people are doing this. I'm happy because it's good for me (I can run more software in the way I prefer to, without installing it). I never expressed anything emotional icon other than the "smiley face", which does not denote sadness.

2. The items with similar timelines are not comparable. The link to AppFS is a link to working code being written and the other links are to documents regarding a desire to have working code written someday, maybe.

3. It's a good idea, that I did not claim to come up with. It's a refinement of how things were done on UNIX campuses long ago where the was a common NFS-mounted directory with applications installed on it. This is definitely not original out claimed to be do. Then the same model that Sun/Oracle uses for Solaris packages in "ipkg" in Solaris 11 is mixed in with that. The result is something very similar to something old as well called 0install, back when it used LazyFS. Again not original. The thing that AppFS adds beyond just the merging of these ideas is the additional idea that the filesystem should be writable. This is not original exactly either since the same thing is done for ClusterNFS (except per-system changes are preserved instead of per-uid).

Given that I've obviously used all these technologies to achieve similar goals it should be obvious that I don't think AppFS is original -- quite the opposite, I've been using it for decades.

The intent of my post, which you seem to have missed, is to point to something else that does the same thing as this new thing. The reason that I would do this is because the ideas a are old and we can improve upon them by looking at similar implementations. I certainly spent a lot of time with 0install's LazyFS (probably one of the heaviest packages).

As an aside, I found your post condescending in tone. Given that you acknowledged that all your statements were based solely on assumptions for things I did not write, I do not think this was effective in communicating your thoughts over this medium. Overall, I'm not sure what your message would contribute if these assumptions you made were wrong. My guess (assumption) is that you did not consider what your message was if your assumptions were wrong. If my assumption/guess is wrong here, please disregard this section and do note that this message still has content that communicates meaning.


Please don't use SHA-1. It's almost broken.


The protocol is actually extensible, and the hashing algorithm MUST always be specified by the server (which the client could then choose to not accept, just as it can reject the certificate because of the signature algorithm).

Also, it would require a preimage attack against one of the hashed items to be useful which SHA-1 will likely be resistant to a long time (though decreasing with the number of items hashed) and SHA-1 is unlikely to be vulnerable to a preimage attack in the near future based on what we know so far.

The signature and certificates that are used to validate the top-level index can be based on a far better hashing algorithm independently of the content-based hashing.


Use multihashes for hash algorithm agility :) https://github.com/multiformats/multihash


Multiple hashing algorithms is already built-in and mandatory, everywhere a hashing operation is used the hashing algorithm must also be specified.


Stage 2 seems problematic at least the way I see it. Most users have at least a thousand derivations- is it possible to fuse mount each one?

Also: I think some people are unaware that Nix hashes are not content addressable. The best solution (which OP is proposing) is probably to use the .nar hashes in IPFS which is content addressable.


Someone should do something similar with Gentoo's portage, because the potential of IPFS could lead to amazing things, like verified pre-compiled -march=native builds for every architechture Gentoo supports.


For a while, I have interested in the idea of modifying the NixOS stdenv (standard build environment) to use a compiler that emits LLVM bitcode, and then having a function that takes any derivation to an equivalent derivation containing the result of running the LLVM IR through the specializer for your architecture. This would mean that you can share a binary cache with others, but still get `-march=native` performance. There's also some pretty interesting ideas along these lines wrt. randomly permuting instructions to prevent ROP attacks (you could even implement that as yet another package -> package function, so that you don't have to do the full set of LLVM optimizations for every package at boot time).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: