Hacker News new | past | comments | ask | show | jobs | submit login
Infinite-Storage-Glitch – Use YouTube as cloud storage for any files (github.com/dvorakdwarf)
376 points by kinduff on Feb 20, 2023 | hide | past | favorite | 192 comments



Literally the modern equivalent of the old video-based backup systems. I remember they existed for both the PC and the Amiga. You would load a blank VHS tape into a VCR, connect the output of the computer to that VCR's input, and then tell the program which data you'd like to backup to the tape. It would generate this flashing "mess" of black and white pixels that you'd record to the tape. To restore, you'd connect the VCR output to a little box that came with the product, it would convert the black and white data in the video signal to a data stream that the program would use to restore your data.

A portion of the signal would be used for timing, metadata and error correction, so the program could tell you if the data was sufficiently damaged upon restore.

LGR has a video on the PC version from Danmere: https://youtu.be/TUS0Zv2APjU

Here's a video example of the Amiga industry's take on the idea: https://youtu.be/VcBY6PMH0Kg?t=573

Sony even did this in 1980 to record CD-quality PCM audio onto VHS tape. https://youtu.be/bnZFLzBO3yc


We used to use regular audio cassette recorders to store/restore data on the TRS80 before hard/floppy drives. It's also how you backed up/restored midi data from early synths. It basically just sounded like an early dial up modem transmitting data when you played it back as audio.

https://www.youtube.com/watch?v=-nHrjqmt_wQ


> It's also how you backed up/restored midi data from early synths.

In a weird closing of the circle, I now store the internal sounds backup of my vintage Juno 60 synthesizer as a WAV file recorded from that tape backup output.

So the digital info of the internal synthesizers gets converted to analog audio in the synth, then passed as audio to my modern computer’s audio interface, which converts it to a digital representation of the analog audio.

And vice versa to restore the backup into the synthesizer’s memory.

Incidentally those backups are more reliable now than when using analog tape decks, since one doesn’t encounter physical tape degradation or a cassette deck “eating” the tape.

I haven’t done any testing with compressed audio formats, but I would expect even lossy formats to perform well, if one keeps the lossiness within certain bounds, so that the highest frequencies in the audio file are preserved.


> I haven’t done any testing with compressed audio formats, but I would expect even lossy formats to perform well, if one keeps the lossiness within certain bounds, so that the highest frequencies in the audio file are preserved.

MIDI as a compression format for that kind of audio data would be a lossy way to encode such an audio stream, and it certainly would perform well, so yes, such lossy formats do exist.

Most research in audio compression has been done on compressors that exploit the limits of human perception, though, so of the shelf lossy compressors may not do very well.


Calling midi a form of lossy audio compression is a stretch. Its like saying whistling a tune is a lossy way to compress a song.

Yeah you're transmitting information about the source but to call it lossy is an understatement.


> It basically just sounded like an early dial up modem transmitting data when you played it back as audio.

Modern synths still do this. The Korg Volca has a library for converting audio into white noise that reprograms/adds more samples.


My Apple ][+ had a tape interface. It mostly worked - if the tape stretched or if the tape speed changed for some reason (dirty capstan, power supply fluctuations, low volume, high volume, evil pixies) then you wouldn't be able to read it back.

This site describes the format, which was basically a header tone, a sync tone, data bits, and then a checksum (not described there but other sites say it was just an XOR). When we got a Disk ][ (5-1/4" floppy drive) all those issues went away.

http://www.applevault.com/hardware/apple/apple2/apple2casset...


How you loaded games onto my Spectrum 48k, had to make sure the volume on the output wasn't too high or too low. I could guess it had a bandwidth of about 2-3kbit from how long I think it took to load (nearly 40 years ago)


It varied with the content - a zero had a high-pitched tone and took half as long to transmit as a one - but it averaged out at around 1800bps.

Certain things had really distinctive sounds, like loading screens. I could recognise Manic Miner loading from just about any ten second chunk.


Atari too; and god they were unreliable. But cheap!


There was also DVStreamer for Windows and other tools for other platforms which would store data on MiniDV tapes. This is of course a bit less interesting than storage to VHS, since MiniDV was already storing a bitstream, but still a clever oddity. I think you could store ~13.5GB in SP mode or 20GB in LP mode (reduced error correction).


Or the audio based ones like this Commodore cassette as stage device. A guy in my neighborhood had one as part of Pac-Man contest winnings.

https://en.wikipedia.org/wiki/Commodore_Datasette


Had one of these with my C64. When my floppy drive broke down, I actually ended up ordering a copy of "Silent Service" on cassette from someone in Great Britain. It kept me sane while I saved for the floppy repair.


I had one in the late 90s that used 8mm tapes and my video camera in the same way. Could store a ton of stuff.

It was pretty finicky, though, and very slow.


I just had a flashback to the Nintendo e-Reader I had as a kid.

Black and white dots in a strip on a card. Swipe the cards to load the games.


While never popular as backup devices, VHS and larger helical scan tape formats were used in a number of interesting niche data storage applications.

Here, for example, is a ruggedized S-VHS data recorder built for military and aerospace applications:

http://www.thic.org/pdf/Jul97/metrum.cduckling.pdf


Minor nit - the Sony PCM adaptor worked with a Betacam deck. My parents did a documentary in the late 80s where almost all the original music for the soundtrack was recorded with a Sony PCM-1 and SL-F1. I wish I still had the masters for it.


I’m sure it would work with any recorder that could record a sufficient quality signal. It was just NTSC video.


Beta had a slightly high bandwidth but probably not enough to make a difference.

These days you get four channels of 96kHz 20-bit audio in a wee box the size of a Betamax tape, with hours and hours of recording on an SD card. That physical size is mostly a function of needing half a dozen XLR connectors on it and a big enough screen to see what you're doing.


This was once very common: https://en.wikipedia.org/wiki/ArVid


This is why I really like HN...

(IMO there is not enough of these posts, and getting less over time.)

A refreshing "actual hacker" project that makes me look anew at the tools I always use...

So, my coffee maker is sending data to the net - maybe I can use that for backup, and have it replicated both in the fridge and in the living room lights...

But how would I retrieve that? Hmm. I assume that both Alexa and Google assistant are tracking everything that goes through my IoT devices. I'll ask GPT how to hack my Nest device to pull back data on demand, that oughta work, surely?! :D


Yes - more of this please :)

Tangentially related and discussed in the past on HN: File transfer via color barcodes and a phone camera

[0] https://news.ycombinator.com/item?id=25459501

[1] https://github.com/sz3/libcimbar

[2] https://cimbar.org


Aha! I was wondering where the github stars were coming from. :)

I did get a kick out of this from the OP: > Binary: Born from YouTube compression being absolutely brutal. RGB mode is very sensitive to compression as a change in even one point of one of the colors of one of the pixels dooms the file to corruption.

It's more than youtube compression -- video compression in general wreaks absolute havoc on our meticulously arranged (and sometimes colored) pixels. It's actually pretty fun/instructive to step through the transition between (what you want to be) two distinct frames when you're trying to (ab)use video for this sort of use case -- there are segments of the frames that get correlated and "flip" together first, resulting in in-between frames that are gibberish even with a modest amount of ECC in play.


Oh boy, OpSec should pay attention to this.


It starts with "no monitors facing windows" and "all visitors hand over phones and any other devices with photographic possibilities" and moves up the paranoia/professional caution scale from there.


Agreed! Here's a fun video by suckerpinch trying out some truly insane data storage ideas https://youtu.be/JcJSW7Rprio


The ping database is one of the most unhinged and fascinating ideas I've ever come across. Its such a crazy concept.


hn hasnt really been like hackaday for a long time.


Previously on 'Esoteric Filesystem Week':

0. Linux's SystemV Filesystem Support Being Orphaned https://news.ycombinator.com/item?id=34818040 by rbanffy 3 days ago, 70 points, 73 comments

1. TabFS – a browser extension that mounts the browser tabs as a filesystem https://news.ycombinator.com/item?id=34847611 by pps 1 day ago, 961 points, 185 comments

2. Vramfs – GPU VRAM based file system for Linux https://news.ycombinator.com/item?id=34855134 by pabs3 1 day ago, 226 points, 71 comments


Maybe it doesn't have a post of its own, but I found these esoteric storage methods greatly entertaining as well: https://www.youtube.com/watch?v=JcJSW7Rprio


Tom7 is a national treasure.



Now we need ytFS FUSE driver to random read these pattern videos. Anyone? ;)


It isn't exactly a "glitch", just something Google doesn't care about (but absolutely will care about if too many people start doing it).

I remember way back in the day someone came up with a clever way of using Gmail attachments to build a cloud storage drive mounted to your filesystem. Then Google themselves released Drive soon after.


I doubt "too many people start doing it" is ever going to happen.

Obviously this is so difficult to use that most people would rather pay $10/month to get 1TB of storage that can be very easily accessed. Even if someone has 100TB of data and wants to back them up, I don't they would do conversion to and from YouTube videos.

An interesting idea, but probably won't get much real world use.


Pirates will take advantage of any suitably easy to use storage. I think YouTube is probably a poor target these days, though - Google's Denial of Service can probably detect something like this in pretty short order.


You also run the risk of YouTube deleting your videos / banning your account. I’m sure they wouldn’t appreciate being used as a generic backup provider.


Nice, until Google introduces a new compression algorithm that says: hey this looks like noise, let's replace it by this other user's noise so we can save on storage costs.



I like the novelty of this project, but if you value your Google account I wouldn't try this out.

Google has been known to close accounts and "related" accounts for abuse (as defined by them). So even if you create another account, don't expect your main account to survive if there's any possible link between them.

They are the judge, jury and executor, so eff around at your own peril.


$20 a month gives you "unlimited" storage at google. they gladly take my encrypted files for years now and I'm up to 80TB. i think its more than reasonable to pay them for that type of service and be slightly above board (the account type i have says i need a minimum of 5 people but its just me).


Which means that if for whatever the reason they decide to close your account - say one of the pictures in those 80TB triggers something that looks like CSAM [1] - and you are seriously up the creek.

Ditto if someone gets hold of your phone and changes the login on your account, or they decide to not let you in because something "looks suspicious".

You are brave. I hope, for your sake, you have a local backup.

[1]: https://9to5google.com/2022/08/22/google-locked-account-medi...


How long does it take for you to download 80TB? From what I can see Google allows you to download 10TB per day but who knows when they will change that limit.


My work flow doesn’t have me redownloading the entire set. I have the drive mounted so it’s more light push and pull.


Even with a gigabit internet connection that would take a couple hundred hours.


A useful rule of thumb to remember, 1gbit/second is about 10TB/day


That doesn't seem to be the case anymore. You have to pay for all the users to get the benefit of unlimited storage.


I must be grand fathered in. I pay $20 flat per month.


Are you paying month to month or yearly? I don't know if you can rely on that storage being available. See below.

https://www.zdnet.com/article/what-happens-to-your-g-suite-u...


I pay $20 a month


You could have got lucky but a bit risky to rely on this data being there IMHO.


$100/mo for absolutely unlimited is still an incredible bargain. $20/mo is in the neighborhood of almost free.


A hundred dollars a month is only an incredible bargain if you have huge amounts of data.

The average person could buy a $100 external drive and replace it every five years, and that would be enough.


$20/mo x 12 months = $240 annually.

$240 annual x 75 years = $18000

Almost free huh?

$12000 a year x 75 = $90000

If I could pay that in and lock it in for the duration, maybe I'd consider that, but no one is going to let you do that.

Y'all got some funny notions on "Free".

Then there's the whole issue of "What if Google gets bored?"


Where did the $12,000/year come in?


I think they messed up $240/yr * 5 people. Which is $1200/yr. Or $100/month * 12.


Fack. $1200 a year x 75 years should be $90000 lifetime..


Why 75 years?


Previous average lifespan of a human being. Just needed a number to stop the analysis at. The one that comes bundled with the implication "Welp, I'm dead now" felt appropriate given that if you are dead, and the data is too hard to access, probate will likely be the end of your data storage foray. Any longer, and you're most certainly talking organizational scale preservation efforts.


If you make a Silicon Valley salary maybe, that is.


This information is wrong. Please give an URL to the service you are writing about to prove that it is right, thanks.


It is not wrong. I pay it every month.

https://workspace.google.com/pricing.html

The enterprise plan.


From what I read they have a limit of 5TB per user. What region are you in? Could you provide an archive link or something to prove it?


Which service is that? Doesn't Workspace allow 1TB?


https://workspace.google.com/pricing.html

Enterprise is $20 (for me at least)


how do you manage the encryption/decryption?


One option is Cryptomator: https://cryptomator.org/



I use Rclone but sadly started this process before it supported crypt so I use encfs


Love rclone!


Borg backup


https://diskprices.com

Price per TB appears to have fallen below $8. So that's $640 worth of storage. Basically, if you were to buy your own hard drives it works out to about $20/mo over two years..


I'm betting Google Storage is a little more fault tolerant...


This particular account while loss making for them it is not by all that much.

A comparable Cloud Storage account on GCP with Coldline storage would be $320/month ($0.004 GB/month) or just $96/month for archival ($.0012/month).

The actual cost to Google is probably < $80/month for this 80TB ( most of the data is going to be in stored in a version of archival given the standard restrictions of 10TB on export.

80TB is also an heavy outlier, given the typical available bandwidth today and disk sizes commercially available for most users it will take a lot of dedicated investment of effort and time to upload this amount of data into the cloud.

Also Google's personal storage pricing is not competitive for pure storage, Backblaze is only $7/month for example. The higher price and value is derived from able to integrate into other Google products and provide storage for those like Gmail, Photos etc.


Depends on the fault. Disk errors, fire, theft? Yes. Account suspension? Hmmm...


or change in EULA!


The other reply mentions backblaze. Whether you choose to use them or not their published driver statistics are quite useful:

https://www.backblaze.com/blog/backblaze-drive-stats-for-202...

A well chosen model has an AFR of well below 1%. To get about say, 100TB, you'd need a dozen drives or so with ZFS and a nice enclosure. It is unlikely even one of them will fail in a given year and you will not experience data loss.

Here is a $100 case: https://ja.aliexpress.com/item/1005003125774264.html

Here is some YouTuber shoving 100TB into it: https://www.youtube.com/watch?v=boKmZKTKXHc


8 dollars for a TB of storage, man. It still makes me feel awestruck sometimes when I see stufff like a $23 3TB HDD.


You’re not accounting for redundancy, administration cost, electricity, heat management, or servers to hold the drives.


The downside of using YouTube for backups is that the comments on your backups are so toxic.


Don’t forget there is no appeal process (let alone the ability to talk to a human)

What a brave new world.


This is starting to change. India has a new law requiring social media companies to have a grievance officer and a formal grievance process that allows users to speak to an actual human. It lays out a set of valid reasons to suspend a user, and cannot suspend or penalize a user for reasons not on the list, and must do so in a fair manner as prescribed by law. If the grievance process fails it can be appealed to a government office and then courts.


Presumably even the BBC could use them...


Bold of you to assume hn hasn't fully convinced me to abandon everything but maps ;)

The 4x size increase is my biggest concern...too bloaty.


Don't forget that YouTube compresses videos, so the extra filesize makes the videos resistant to that destructive process.


I wrote something just like this with Discord, and I even got it to host full videos which you can play back in browser. It's a good backup service. [0]

I want to expand this in into a fully modular service that you write payloads and scripts for various services, so when you upload a file its spread out across many different providers. When you're downloading, you just go down the list check what still exists, and verify the checksum. This should be stable for many years.

I plan to take a look into facebook and see what can/cant be accessed there. I had this exact thought with youtube and thought about using a pixel reader to exact out data. Same idea for different image hosting services like imgur.

[0] https://github.com/5ut/DiskCord


The author says another Discord project served as inspiration: https://github.com/pixelomer/discord-fs

Maybe you could join forces.


I've observed that with any piece technology where you're permitted to write / upload information and freely access it afterwards, someone will attempt to (ab)use it for file storage and write a blog article about it later :)

My favorite example of this was people storing files in "secret" subreddits by using posts and comments to store bytes. When they were later discovered by other users, the seemingly random strings sparked a huge conspiracy about their possible meaning.

However, you always have the problem that your unwilling host may remove your "files". I sometimes wonder about file storage using a textual output format that can't be distinguished from normal user interactions.


I remember when GMail was by invite only, and at the time they were offering quite a larger amount of storage for Mail than anyone else so people started using their GMail drafts to store files.

That was the first time I can across such a thing.

Someone even made an extension for Windows XP that allowed you to mount GMail as a storage volume.

> GMail Drive is a Shell Namespace Extension that creates a virtual filesystem around your Google Mail account, allowing you to use Gmail as a storage medium.

http://www.viksoe.dk/code/gmail.htm


GmailFS was another early implementation.


Writing the GmailFS HOWTO, and fixing a bug in the process, was my first exposure to the power of OSS. Looking back, I'm pretty sure this is what led me to persue software engineering as a career!


If you get a job at Google you might even recoup the costs of GmailFS ;)


> I sometimes wonder about file storage using a textual output format that can't be distinguished from normal user interactions.

You could use a reproducible LM (for instance using Bellard's NNCP as basis), and encode one bit in one word by taking the {first, second} most probable next word.


This is fascinating! And the file transfer can be then fully disguised as a conversation, with a ChatGPT-like client and all. An unsuspecting user will see a chat bot; a specialized client app would be able to receive files by talking to it.


In this modern cloud-giant world it's abused for file storage yes. But I come from the more traditional web hosting world of the early 2000s and back then the general rule was that anything that could store information online would sooner or later be used to store porn.


How about storing your files in other people's DNS caches?

https://blog.benjojo.co.uk/post/dns-filesystem-true-cloud-st...


Discussed 5 years ago:

https://news.ycombinator.com/item?id=16134041 (36 comments)


In completely unrelated Hacker News :

"Ask HN: What are these strange random strings spamming my blog?"

https://news.ycombinator.com/item?id=34865695


Now I want to write a blog post about storing files inside of blog posts about storing files inside of blog posts …<error: recursion limit reached>


This makes me think of Turing machines which store their own code inside them selves, which you can use for all kinds of interesting proofs. I wish I could find more about this.


Look into Squeak/Smalltalk. It is an operating system/desktop/IDE with self contained compiler.


> I sometimes wonder about file storage using a textual output format that can't be distinguished from normal user interactions.

I guess it depends on what noise-to-signal density you’re after.

With a a long enough ChatGPT generated output, no one would question a few out of place characters or even an emoji. With 3000+ different emojis to choose from that encodes an entire byte of data.

Another idea is using “they’re”, “their”, “there” as bits.


I vaguely recall some secretive company (perhaps Apple) using adjustment of spacing, capitalization, etc. to encode a unique serial number in messages sent by the CEO, which could then be used to trace leaks.


Genius.com hid the message “red-handed” in Morse code using alternating quote characters to prove that Google was displaying their lyrics.


CIA and similar have been doing that for something like 50 years. Made it into Tom Clancy novels in the 80s IIRC.


Elon Musk was the one claiming to do it


Many people claim to do it.

It's a plot point in a Patriot Games, a 1987 Tom Clancy novel that introduced the term "canary trap" for this trick. He says he invented the term, but not the technique, which was already in use.

In a spat over the plot of Star Trek III (so, early 1980s), Harve Bennett distributed slightly different versions of the script, allowing him to track a leak back to Gene Roddenberry.

The book SpyCatcher says it was in routine use at MI-5, and you can find variations of it in lots of fiction too.


> When they were later discovered by other users, the seemingly random strings sparked a huge conspiracy about their possible meaning.

Makes me wonder if numbers stations are actually just the worlds slowest modems


Pretty easy to do that, use a fixed point implementation of GPT(N) of whatever size you like and range code your data into the model probabilities. This also will achieve a close to rate optimal embedding-- allowing you to embed about as much data as the language model thinks the text has...

If you encrypt the data and include a checksum or other identifying bytes in the ciphertext you can even have unwitting human participants in the discussions and if their posts are context your embedded data will be credible replies. You just have to be sure that threading behavior doesn't make it impossible to give the decoder identical context.


> However, you always have the problem that your unwilling host may remove your "files". I sometimes wonder about file storage using a textual output format that can't be distinguished from normal user interactions.

Well, with Chat GPT that's almost trival. POC https://imgur.com/fQvMh9S


One of the things I’ve successfully used YouTube for was video storage of my security camera system. Unlimited video storage with a simple app to watch them in case I need to check something out!

And it’s simple: camera uploads automatically via FTP, inotifywait script uploads to google!


Shhhh.. don’t you know what the first rule about YouTube storage for security systems is??


Right this moment an engineer at Google is writing a personal OKR to block this and declare $$$ savings in order to get promoted next year.


All they'd have to do is limit the amount of private videos you're allowed to store. If your only option for storing unlimited security footage is to make it public, then people probably wouldn't do that.

Alternatively, if they're allowed to use the footage to train some AI that will help them take over the world, then maybe they want all your random footage for free.


Security cameras are usually low-framerate and compress highly anyway, due to not much happening between each frame, so I doubt it's going to be a significant cost in comparison to all the other, far more massive content which is also constantly being uploaded.


haha this is so cool, i made something similar https://punkjazz.org/scrambled-eggs/ few years ago to explore transferring files directly through the camera so nobody can "see" what you download, because no packets go through the internet, i managed to do 10kbps or so

the modern qr readers are so fast and easy to use, its unbelievable


Nice! It's such a neat way to transfer information :)

This guy extended the idea using fountain codes, which allows you to miss arbitrary frames and still recover the full message without waiting for the missed frames to re-appear:

https://divan.dev/posts/fountaincodes/


One could do side-channel sliding window hand shakes with audio, to improve download performance. :-)


i was actually thinking about that, could be even more cool now with modern text to speech and whisper and some funky word based encoding with huge dictionary like:

teacher: 0b00010010101001, school: ...

and then the website can encode the data as a sentence and just text to speech it and the receiver can use whisper to speech to text and decode

will be the most creepy thing because it can be very steganographic and sound like a real sentence


Yeah, who would abuse a free service to store their personal files?

Data Block: c3828abe

c5cfe61f4e61c9eda05e39903df580566859708a52957754e06fd18feaceca5ec0cdcac4b24b0f9ac8d9f212301916ea9ebcb2e291e2e950e0118f150c8cde02 34e770773cb93d6f1b757098890475cb00bef5ca4275c51021118ac1f01b71db3604063fd945480afc6b6b5b8d125129f7a9813a4997bdea27bbe5f6c17abfeb f46309c93430f78d37d23c0ef646cf7796e6de2b072d771b35b832a5b5328d1c09c5d32eaf6309b3119e8468ed02f62cd4b25c6785792ec82edc72667da8e36e 3b7b0d22fd708f5a3ff4787bf9474f84dff52fe33a38f4b4fee6759498b38d2c3af01db8d3dc5b1bb1cf6d203f24a4f6016caf42ad5cac76d1b0a0bf01a435b0 54a288c7cf9859dde401af51685eef23661ff0102a94caab2df9bf298c07538885baec81576513b9a7591d429db24b221c071cf0d929308243b0af4535810052


Video steganography might be a better approach and would be less likely to trigger account banning or claims of abuse by the hosters. The issue of avoiding data loss due to lossy compression algorithms seems to be an active area of research:

https://jis-eurasipjournals.springeropen.com/articles/10.118...

> "Moreover, most video-sharing channels transmit the steganographic video in a lossy way to reduce transmission bandwidth or storage space, such as YouTube and Twitter. . . Robust video steganography aims to send secret messages to the receiver through lossy channels without arousing any suspicions from the observer. Thus, the robustness against lossy channels, the security against steganalysis, and the embedding capacity are equally important."

I suppose in this project, the blocks of pixels are large enough to avoid data loss due to compression?


You could do this with any service which accepts user content. You could have a tumblr blog focused on “paranormal phenomenon in white noise images” and fill it full of data embedded in images. If anyone ever asks you just explain that like many pattern illusions not everyone can see images contained within - try squinting, or covering the eye on the predominant side of your body, stand on your head, blah blah blah.


> fill it full of data embedded in images

This is even easier, because jpg's ignore additional data past the end of the file. Post a low-res ~200kb jpg that has an additional ~20mb of data appended. It'll still render perfectly fine.


Most platforms compress uploaded images, which would result in the appended data being removed.


You could do the same thing with PNGs and different thunk types. Although in both cases you run a risk that some paranoid developer might filter out unexpected thunk types or additional data so in both cases it would be best to put the data in the image payload.

The other consideration is that Tumblr was always very “creator” oriented and while they might produce thumbnails of various sizes the original image is still available and not mangled by resizing algorithms. Other free image hosts are going to crush that image down the maximum amount tolerable to the human eye. Google even does that for paid photo hosting.


I understand that the goal is to make the data survive video compression, but wouldn't it make sense to use at least some color information instead of entirely black and white pixels?


Chroma is lossier than luma in most common video codecs. AVC is 4:2:0 on YouTube. 4:2:0, quite confusingly, means that chroma is halved in both dimensions compared to luma (so one chroma pixel is congruent with four luma pixels). As well, most decoders will apply filtering on the chroma to upsample it to match the luma, meaning that your color boundaries are going to be indistinct at best, and you might even lose the original chroma values entirely in the process. You'd have to use multiple chroma pixels as one metapixel in order to increase resilience, which would diminish the capacity. With modern codecs, a monochrome signal seems better to use for actual data, although I could see it being useful to use chroma for metadata.


Seems like it could benefit from forward error correction to defend against bit errors (this is how QR codes survive big chunks being partially obscured or replaced by logos, and also how CDs survive being scratched within certain limits).


that error correction greatly inflates the final size of the QR code, too.

there should be some error correction in a system like this, though.


You can choose how much correction you get, in terms of how many bit errors you can correct per 'n' bits. And you need surprisingly few bits to get pretty great performance under "reasonable" bit-error rate channel (like under 10% overhead). You can wind up the strength of the error correction if you anticipate a noisier channel.

QR codes have 4 levels of correction you can use depending on how robust you wish them to be. CDs and DVDs use two chained, fixed, levels to keep the decoders simple. CDs have 25% overhead, but their correction is very strong: they can correct 4000 bits in a row.


Seems that you did not read the README:

There are two encoding modes, RGB and B/W. It uses a pixel-to-data width of 2x2, but says YouTube's compression algorithm is brutal, and one corrupted pixel already renders the whole thing corrupted.


Using full RGB clearly wouldn't work, but I do wonder if you could use the color channel for something, possibly redundancy information.


Very likely you could get away with at least 4 bits (16 distinct colors) per pixel, which is 4x more efficient than pure black and white.


I think the size of the effective chroma metapixels is more important than the range of values. You need to make them larger in order to keep the decoder from blending them together when upscaling the 4:2:0 chroma.

Now, if you're using a 4:4:4 format to do this, then you should be able to use smaller chroma metapixels (I still wouldn't use the full chroma resolution, though, unless you're using a high bitrate or a lossless codec). However, that risks data corruption if passed through a pipeline that downsamples the chroma.


This reminds me of a stupid idea I had: would it theoretically be possible to store data using the backbone of the Internet itself? You'd bounce packets (probably TCP) back and forth between two hosts with bytes that aren't actually written to a disk anywhere so they just exist as a stream until one end decides to copy a section for itself.


This isn't a new idea. It can be traced back to delay-line memory [1] and many thought experiments have been suggested to use a large network as such. Even some actual demos have been made [2][3].

1. https://en.wikipedia.org/wiki/Delay-line_memory

2. https://code.kryo.se/pingfs/

3. https://www.shysecurity.com/post/20120401-PingFS


Suckerpunch did a video on “harder drives” where he implemented a block storage device by storing data in ping packets. It’s one of my all time favourite technical talks - his style is amazing, and he’s an incredible story teller.

https://youtu.be/JcJSW7Rprio


Am I correct in remembering a version of this video that existed before the COVID tests?


You’re effectively relying on two computers to be up and running 24/7. It’d be twice as better of an idea (which is still a very low number) to just store that data in RAM on a single device, rather than rely on two.


This was posted 2 days ago also (but received very little attention): https://news.ycombinator.com/item?id=34850643


Someone also posted a similar project here a few months back: https://news.ycombinator.com/item?id=31495049


This is bound to get you banned. I would do it a little bit more clever (with lower bitrate/throughput/storage sizes)...

Encode the data inside audio, preferrably outside human audible range, and then use a nice video of singing birds, or whales talking, and use the "hidden" frequencies to hide the data.

I don't know if Youtube has any filters that cut out frequencies, but this way they can't ban you, since you've uploaded a really nice personal video of your singing birds, instead of the conspicuous looking QR-like codes as in the OP ;-)


> preferrably outside human audible range

With any lossy audio compression algorithm, everything outside the human audible range is filtered away completely as a first step. That's compression 101.

Also there's much less bandwidth in the audio channel than the video channel, and then far less again if you're trying to hide a signal in another signal.


Even without compression, most audio (including YouTube's) is sampled at 44.1kHz which filters anything >22kHz. https://en.wikipedia.org/wiki/Nyquist_rate


Do this at you own risk, ive done this with lidar data (which didnt need to be as persice as binary, which is what im seeing in this post) which worked fine. 3 years later i revisited the project and it was broken because youtube compressed the files in such a way where it made the lidar just innaccurate enough to be unusable. I cant imagine storing data in binary where just one bit wrong screws everything


I have many old videos that have lost their "HD" encoding, and now look like potato vision. I no longer (silly that I did) trust YouTube for video storage.


Until Google bans your account completely across all services with no means for appeal.


Using video formats to store other data has a long history.

ADAT for example.

https://en.wikipedia.org/wiki/ADAT


Nice work! I made a much worse variant of this years ago, with a “mosaic” mode[1]: whatever YouTube was doing for compression at the time handled multiple QRs tiled next to each other much better than it did a single large one.

[1]: https://github.com/woodruffw-hackathons/where-tube


Off topic

Does YouTube let you store unlimited video content (real video like screen recordings etc of our own work - no shady or sneaky stuff, nor any copyrighted stuff etc)

With all videos marked private ...so they are just "storage" by account owner and no other users can access them and youtube cannot monetize it ?


Apparently? We do a bunch of private videos for storage (many are also unlisted) and have no complaints.

I wouldn’t use it as my ONLY backup of course.


There was a thread here a while back where someone lost years of corporate training content when YouTube deleted it.

I'd it's anything vital, as in your paycheck depends on it, I'd have multiple backups.


Oh.... Verry Interesting!! Hoping someone has the answer here


Hmmm I’m not convinced.

I had a good look into these sorts of technologies but the host almost always changes the file so it makes it impossible to retrieve the data hidden in the file.

You need a file hosting platform that guarantees not to change the uploaded file.

How does this avoid such problems ?


If you look at the example video, it doesn't depend on the video not being changed, but it does depend on a minimum level of quality. That is, as long as the video quality is high enough (720p in this case) to get back the original black and white pixels, you're fine. The data is not hidden, it's there in plain sight in the video.


OK I'm convinced. I like it!


It‘s described in the README. The video has 2x2 pixel blocks that are either black or white, so each one signifies a bit. So a 1920x1080 frame encodes 518,400 bit = 64.8KB

The assumption is that video compression won’t mess up those blocks beyond recognition, so you should retain the information as long as the rendered resolution and bitrate don’t drop too low.

Maybe this could be improved by e.g. using 32 colors instead of 2, and bumping the block size to 3x3 (for safety) which should yield ca 144KB per frame.


The block size should honestly be tuned for the codec in use, chiefly to determine the best block size to fit with the codec's macroblock size. That's usually either 8x8, or with newer codecs 16x16. I feel like something like maybe 8x2 would be smart, and I like the idea of monochrome for resiliency, since chroma is downsampled. The fewer possible pixel combinations you have within a macroblock, the better the compression will probably end up being as well. And 8x2 would somewhat evoke the look of the old video backup systems as well, for the fun of the nostalgia of that.


Thank you for making this! I had the exact same idea quite some time ago but had neither the skills nor the passion to actually create it.

Seeing it come to life has just scratched a long forgotten itch and damn it feels great.


It'd be cool to add a FUSE wrapper around this. At one point I had a POC for a few of these sorts of things going (not as cool as this project, just data stored to X free cloud store/metadata) and creating a redundant transparent FUSE wrapper was probably the next step. With multiple sources, you could even treat mux data between slow/unreliable sources (content hosts in eg russia or asia) to 'stripe' the data. And then, you could make these modular so that new sources could be onboarded easily...

Yeah, I really like this stuff. Awesome project.


This has been done many times in the past, one popular tool: https://github.com/dzhang314/YouTubeDrive


This was what reminded me. It was posted here on HN few months ago.


People do this all the time with any web connected service that accepts data. People use open strings in AWS services, like lambda function names, to store arbitrary bits.


> Unfortunately no filesystem functionality as of right now

I chuckled because of my own thought that seek (FS call) can be implemented via youtube video seeking


Using this can get your google account and related IP addresses banned? Isn't this sort of a Vandalism? But why attack Youtube out of all places? Do it to TikTok instead. They won't notice the difference(LOL). I would've said "delete this" normally but today's political climate demands more free space on the internet per individual definitely so...


Reminds me of Gmail Drive from years ago, where you could use your Gmail space as a virtual file system.


This looks like a fun project to tinker around with. I have one small request. The pack file in the repo is 146MB. Is there a way to make that smaller on github?

    pack-5d55e1f4809a8dae84591bc04b019b2ae8137f77.pack 146M


Interesting to note that how black and white pixels get compressed less and thus make for easier restoration. It would be interesting to generate a 512x512 QR code that changes every 3 frames to see how well that would work for recovering data.


I wonder if you could get a better pixels/bit ratio when using DCT/2DFFT based encoding since you'd still encode lower frequency data but it would be in a format that compression algorithms would also try to maintain.


" I still don't condone using this tool for anything serious/large. YouTube might understandably get mad."

I do love these kinds of hacks, but I hate these kind of weaselly cop-out statements. You made the tool, own it!


Considering this might very well end up you losing your google account, it is a very necessary warning.

If anything, the author should be more clear about what happens if youtube gets mad: you might lose your google account along with access to mail, drive, photos etc


The warning is fine, the warning while playing Pollyanna is annoying and disingenuous. To put it in a more constructive way - the author should be proud of the hack and all the fun and turmoil it could cause.


That would be black hattery. They could get people banned. I'd prefer nobody being proud of that kind of thing.


I feel like this is overcomplicating things. You should be able to download the original video you uploaded instead of downloading a compressed version. I'm sure the uncompressed version still exists.


Ensuring that you can retrieve the data from the viewable video means that this is also be a way of file transfer, one that other people won't be able to download the original video for.


Dumb yet interesting idea, but if you care your data you should put it on google especially you are abusing their service, if you do not care why you even waste your time doing all this except for fun.


I immediately remembered https://en.wikipedia.org/wiki/ArVid


Or I could get the bytes data then create a really big hash table, boom have discovered a glitch in the simulation


This is hilarious


"hilarious" is a bit strong. "interesting" feels better to me.


The real trick would be to monetize the resulting videos so you get paid every time you access your storage.


It could probably be hidden in a normal-looking video using steganography. Lower effective bitrate of course.


THAT is 1337!

A true hacker spirit worthy of Captain Crunch whistle and its application toward free payphone calls.


i would've save this "improvement" for my own project, but consider using colours instead of just black and white squares for higher data density! i am unsure how much compression can affect its effectiveness.


> written entirely in Rust my beloved

jesus christ enough with the jerking


This would be really impressive with some stenography.


if only somebody wrote client for personal backblaze backup (which is also unlimited), can't easily store terabytes from my linux PCs



"You want to try some snowcrash?"


after your finals, you should read about forward error correction.


they're here...

nice end of transmission simulator to boot!


please find better uses of your time. this is such an obvious abuse.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: