Hacker News new | past | comments | ask | show | jobs | submit | awegio's comments login

In their FAQ

> How does it work?

> Using a "Virtual Distributed Filesystem" (VDFS), in other words; a decentralized database that emulates a filesystem. It indexes hardware filesystems to create a master database that is synchronized in realtime between your devices running Spacedrive.

> What makes this different to Dropbox or Google Drive?

> It is not a storage provider, Spacedrive is simply a database that exists on top of existing storage layers, from cloud services like Dropbox, Google Drive and iCloud to physical devices and external drives you already own. It doesn't provide you more storage, rather a supercharged view of your existing storage.

So more like Syncthing? Or rather Windows File Sharing/Samba? I don't really get it


That explanations read like it's taking in any kind of file system, local or remote, and combines them into one unified file system. Not really new, but most other file managers focus on the popular storage-services (Dropbox, Google Drive, OneDrive, samba, ftp), and are not open source. If you can easily create plugins for any remote storage, not just the traditional file storages, or even make up your own, then it could become something promising.


It depends on your web browser. Just see what happens here https://webauthn.io/

Firefox on Desktop tells me to "touch my security key". Not sure how that works. Firefox Android gives me a few hardware options to store my passkey to. Chrome Desktop asks me to enable Bluetooth. Chrome Android asks which Google Account to use.


Just tried that with Firefox on Android and while it works, I can't find any evidence of a stored passkey on my device, let alone a way to export it.


Are on Linux? AFAIK it doesn't work on Desktop Linux.


You need a compact encoding for chess engines that explore billions of states as fast as possible to plan the best move.


Well, this is arguably a kind of compression, right? So you'd be trading CPU time for fewer bytes? Is that a desirable tradeoff at chess engine scales?


Bit packing/mapping et c. isn’t compression the way you’re thinking. What it is, is concise. It requires that the program know what each bit means, rather than telling the program what a value means (as a json structure might), so it shifts where meaning is assigned strictly to the program—a convention must be encoded in the program, not figured out at runtime, though technically you could turn this back into a form of config if you wanted, it just wouldn’t be jumbled up with your data—but it doesn’t really compress the data itself. It’s just efficient at representing it.

[edit] shorter version of the above: it stores the values, but doesn’t store what they mean.


It's not compression in the normal sense of the word. Most parsing is directly to data. So e.g. you know the square of some piece is the next 5 bits. In languages that allow it you can cast directly from the next bit offset to an e.g. byte. This is going to dramatically faster than parsing much more loosely structured JSON. As database sizes increase you also get worse performance there, so it's a double hit. So with these sort of representations you get orders of magnitude faster and smaller. Sometimes there really is a free lunch!

Also I'd add the sizes involved here are kind of insane. I wrote a database system that was using a substantially better compression that averaged out to ~19 bytes per position IIRC. And I was still getting on the order of 15 gigabytes of data per million games. Ideally you want to support at least 10 million games for a modern chess database, and 150 gigabytes is already getting kind of insane - especially considering you probably want it on an SSD. But if that was JSON, you'd be looking at terrabytes of data, which is just completely unacceptable.


To give you an example, the Syzygy tablebase for all endgame positions with 7 pieces remaining is 18.4 TB. The estimated size for 8 pieces is 2 PB.

There are different applications for different things: If you want to host a website with real-world tournament results involving only humans, you probably can get away with using more bytes. But if you're writing an engine that uses pre-computed positions, you want to be as compact as possible.

https://en.wikipedia.org/wiki/Endgame_tablebase#Computer_che...

I did laugh a bit at this bit because "conventional server" and "64 TB RAM" is hilarious to think about in 2023, but will probably be the base config in a Raspberry Pi in 2035 or so:

> In 2020, Ronald de Man estimated that 8-man tablebases would be economically feasible within 5–10 years, as just 2 PB of disk space would store them in Syzygy format, and they could be generated using existing code on a conventional server with 64 TB of RAM


CPUs excel at decompression; more than one engineer has remarked to me that the fastest way to populate a cache is to decompress data in-line.


Is your assertion that it takes more time for a CPU to read values out of a 30 byte struct and do a couple shifts and branches than to parse a JSON representation?


The JSON needs to be parsed only once. Then it is (or can be) just any object to your liking.

JSON is for storing data as text. Not work with that text all the time.


It's not just reading, you need to process the data to get castling availability, en passant target etc.


> There's exactly one hit

There are N possible sequences, and you try N times with a success probability of 1/N each (because it is a good hash function). This means the expected number of hits is 1.


The probability is 1/e


The probability of missing is 1/e


A language model estimates the probability of a sequence of words P(w_1, ..., w_n) or equivalently P(word | context).

For compression, word sequences that have higher probability should be encoded with shorter codes, so there is a direct relationship. A well known method to construct such codes based on probabilities is Huffman coding.

This works whether you use a statistical language model using word frequencies or an LLM to estimate probabilities. The better your language model (lower perplexity) the shorter the compressed output will be.

Conversely, you can probably argue that a compression algorithm implicitly defines a language model by the code lengths, e.g., it assumes duplicate strings are more likely than random noise.


> put programs in custom sandboxes, where /bin/bash is found exactly where they expect to find it

NixOS can do that, I think it was necessary to get Steam games running

https://nixos.org/manual/nixpkgs/stable/#sec-fhs-environment...


Why? Only if you would reuse the same data multiple times, or if you use a unique random generator that is easily distinguishable from other data. So maybe don't use the 5x5 island, but when done right the approach shouldn't help fingerprinting?


Exactly. Just pick a random location on the planet and have it randomly wander. Have a consistent but (statistically) different location per-app.


That's probably because you confused Vim and Kakoune key bindings.

Kakoune

- n: next match

- alt+n: previous match

- shift+movement: extend selection by movement

Vim

- n: next

- shift+N: previous match

So if you press shift+N as you are used from vim, you start adding a lot of selections instead of going to previous match. I believe this difference is the most confusing for people who switch from Vim to Kakoune.


> Node packages work much better

Are you sure about that? I haven't seen a node app built from source on nixpkgs yet. That includes Electron apps like Signal Desktop, which is a bit disappointing.

There is this article about trying to package jQuery on Guix:

http://dustycloud.org/blog/javascript-packaging-dystopia/


Grep nixpkgs for `buildNpmPackage`, it's ridiculously easy to package a node app nowadays.


Yes, buildNpmPackage works great.


Guix has several different npm importers (none of them merged), but it's debatable whether it is desirable to build npm packages from source when it either creates thousands of barely useful packages.


DeepMind's blog post on AlphaDev says:

> AlphaDev uncovered faster algorithms by starting from scratch rather than refining existing algorithms

Finding that specific optimization, especially when given the comments, seems almost trivial by comparison.

Edit: I tried to understand the optimization in question. This is not the full sort3 algorithm, but only under the assumption that B < C. In that case the GPT-4 answer is actually wrong because it wasn't given that assumption.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: