Infinite-Storage-Glitch – Use YouTube as cloud storage for any files

LocalH · on Feb 20, 2023

Literally the modern equivalent of the old video-based backup systems. I remember they existed for both the PC and the Amiga. You would load a blank VHS tape into a VCR, connect the output of the computer to that VCR's input, and then tell the program which data you'd like to backup to the tape. It would generate this flashing "mess" of black and white pixels that you'd record to the tape. To restore, you'd connect the VCR output to a little box that came with the product, it would convert the black and white data in the video signal to a data stream that the program would use to restore your data.

A portion of the signal would be used for timing, metadata and error correction, so the program could tell you if the data was sufficiently damaged upon restore.

LGR has a video on the PC version from Danmere: https://youtu.be/TUS0Zv2APjU

Here's a video example of the Amiga industry's take on the idea: https://youtu.be/VcBY6PMH0Kg?t=573

Sony even did this in 1980 to record CD-quality PCM audio onto VHS tape. https://youtu.be/bnZFLzBO3yc

cronix · on Feb 20, 2023

We used to use regular audio cassette recorders to store/restore data on the TRS80 before hard/floppy drives. It's also how you backed up/restored midi data from early synths. It basically just sounded like an early dial up modem transmitting data when you played it back as audio.

https://www.youtube.com/watch?v=-nHrjqmt_wQ

noizejoy · on Feb 20, 2023

> It's also how you backed up/restored midi data from early synths.

In a weird closing of the circle, I now store the internal sounds backup of my vintage Juno 60 synthesizer as a WAV file recorded from that tape backup output.

So the digital info of the internal synthesizers gets converted to analog audio in the synth, then passed as audio to my modern computer’s audio interface, which converts it to a digital representation of the analog audio.

And vice versa to restore the backup into the synthesizer’s memory.

Incidentally those backups are more reliable now than when using analog tape decks, since one doesn’t encounter physical tape degradation or a cassette deck “eating” the tape.

I haven’t done any testing with compressed audio formats, but I would expect even lossy formats to perform well, if one keeps the lossiness within certain bounds, so that the highest frequencies in the audio file are preserved.

Someone · on Feb 20, 2023

> I haven’t done any testing with compressed audio formats, but I would expect even lossy formats to perform well, if one keeps the lossiness within certain bounds, so that the highest frequencies in the audio file are preserved.

MIDI as a compression format for that kind of audio data would be a lossy way to encode such an audio stream, and it certainly would perform well, so yes, such lossy formats do exist.

Most research in audio compression has been done on compressors that exploit the limits of human perception, though, so of the shelf lossy compressors may not do very well.

DontchaKnowit · on Feb 21, 2023

Calling midi a form of lossy audio compression is a stretch. Its like saying whistling a tune is a lossy way to compress a song.

Yeah you're transmitting information about the source but to call it lossy is an understatement.

bobleeswagger · on Feb 20, 2023

> It basically just sounded like an early dial up modem transmitting data when you played it back as audio.

Modern synths still do this. The Korg Volca has a library for converting audio into white noise that reprograms/adds more samples.

chiph · on Feb 20, 2023

My Apple ][+ had a tape interface. It mostly worked - if the tape stretched or if the tape speed changed for some reason (dirty capstan, power supply fluctuations, low volume, high volume, evil pixies) then you wouldn't be able to read it back.

This site describes the format, which was basically a header tone, a sync tone, data bits, and then a checksum (not described there but other sites say it was just an XOR). When we got a Disk ][ (5-1/4" floppy drive) all those issues went away.

http://www.applevault.com/hardware/apple/apple2/apple2casset...

ta1243 · on Feb 20, 2023

How you loaded games onto my Spectrum 48k, had to make sure the volume on the output wasn't too high or too low. I could guess it had a bandwidth of about 2-3kbit from how long I think it took to load (nearly 40 years ago)

Gordonjcp · on Feb 21, 2023

It varied with the content - a zero had a high-pitched tone and took half as long to transmit as a one - but it averaged out at around 1800bps.

Certain things had really distinctive sounds, like loading screens. I could recognise Manic Miner loading from just about any ten second chunk.

michaelcampbell · on Feb 21, 2023

Atari too; and god they were unreliable. But cheap!

bri3d · on Feb 20, 2023

There was also DVStreamer for Windows and other tools for other platforms which would store data on MiniDV tapes. This is of course a bit less interesting than storage to VHS, since MiniDV was already storing a bitstream, but still a clever oddity. I think you could store ~13.5GB in SP mode or 20GB in LP mode (reduced error correction).

adolph · on Feb 20, 2023

Or the audio based ones like this Commodore cassette as stage device. A guy in my neighborhood had one as part of Pac-Man contest winnings.

https://en.wikipedia.org/wiki/Commodore_Datasette

themadturk · on Feb 21, 2023

Had one of these with my C64. When my floppy drive broke down, I actually ended up ordering a copy of "Silent Service" on cassette from someone in Great Britain. It kept me sane while I saved for the floppy repair.

cortesoft · on Feb 20, 2023

I had one in the late 90s that used 8mm tapes and my video camera in the same way. Could store a ton of stuff.

It was pretty finicky, though, and very slow.

doubled112 · on Feb 20, 2023

I just had a flashback to the Nintendo e-Reader I had as a kid.

Black and white dots in a strip on a card. Swipe the cards to load the games.

jasomill · on Feb 21, 2023

While never popular as backup devices, VHS and larger helical scan tape formats were used in a number of interesting niche data storage applications.

Here, for example, is a ruggedized S-VHS data recorder built for military and aerospace applications:

http://www.thic.org/pdf/Jul97/metrum.cduckling.pdf

Gordonjcp · on Feb 21, 2023

Minor nit - the Sony PCM adaptor worked with a Betacam deck. My parents did a documentary in the late 80s where almost all the original music for the soundtrack was recorded with a Sony PCM-1 and SL-F1. I wish I still had the masters for it.

LocalH · on Feb 21, 2023

I’m sure it would work with any recorder that could record a sufficient quality signal. It was just NTSC video.

Gordonjcp · on Feb 21, 2023

Beta had a slightly high bandwidth but probably not enough to make a difference.

These days you get four channels of 96kHz 20-bit audio in a wee box the size of a Betamax tape, with hours and hours of recording on an SD card. That physical size is mostly a function of needing half a dozen XLR connectors on it and a big enough screen to see what you're doing.

userbinator · on Feb 21, 2023

This was once very common: https://en.wikipedia.org/wiki/ArVid

sam_goody · on Feb 20, 2023

This is why I really like HN...

(IMO there is not enough of these posts, and getting less over time.)

A refreshing "actual hacker" project that makes me look anew at the tools I always use...

So, my coffee maker is sending data to the net - maybe I can use that for backup, and have it replicated both in the fridge and in the living room lights...

But how would I retrieve that? Hmm. I assume that both Alexa and Google assistant are tracking everything that goes through my IoT devices. I'll ask GPT how to hack my Nest device to pull back data on demand, that oughta work, surely?! :D

ggerganov · on Feb 20, 2023

Yes - more of this please :)

Tangentially related and discussed in the past on HN: File transfer via color barcodes and a phone camera

[0] https://news.ycombinator.com/item?id=25459501

[1] https://github.com/sz3/libcimbar

[2] https://cimbar.org

sz3 · on Feb 21, 2023

Aha! I was wondering where the github stars were coming from. :)

I did get a kick out of this from the OP: > Binary: Born from YouTube compression being absolutely brutal. RGB mode is very sensitive to compression as a change in even one point of one of the colors of one of the pixels dooms the file to corruption.

It's more than youtube compression -- video compression in general wreaks absolute havoc on our meticulously arranged (and sometimes colored) pixels. It's actually pretty fun/instructive to step through the transition between (what you want to be) two distinct frames when you're trying to (ab)use video for this sort of use case -- there are segments of the frames that get correlated and "flip" together first, resulting in in-between frames that are gibberish even with a modest amount of ECC in play.

egberts1 · on Feb 20, 2023

Oh boy, OpSec should pay attention to this.

thombat · on Feb 20, 2023

It starts with "no monitors facing windows" and "all visitors hand over phones and any other devices with photographic possibilities" and moves up the paranoia/professional caution scale from there.

luma · on Feb 20, 2023

Agreed! Here's a fun video by suckerpinch trying out some truly insane data storage ideas https://youtu.be/JcJSW7Rprio

koromak · on Feb 21, 2023

The ping database is one of the most unhinged and fascinating ideas I've ever come across. Its such a crazy concept.

aaronblohowiak · on Feb 21, 2023

hn hasnt really been like hackaday for a long time.

OscarCunningham · on Feb 20, 2023

Previously on 'Esoteric Filesystem Week':

0. Linux's SystemV Filesystem Support Being Orphaned https://news.ycombinator.com/item?id=34818040 by rbanffy 3 days ago, 70 points, 73 comments

1. TabFS – a browser extension that mounts the browser tabs as a filesystem https://news.ycombinator.com/item?id=34847611 by pps 1 day ago, 961 points, 185 comments

2. Vramfs – GPU VRAM based file system for Linux https://news.ycombinator.com/item?id=34855134 by pabs3 1 day ago, 226 points, 71 comments

Lt_Riza_Hawkeye · on Feb 20, 2023

Maybe it doesn't have a post of its own, but I found these esoteric storage methods greatly entertaining as well: https://www.youtube.com/watch?v=JcJSW7Rprio

OscarCunningham · on Feb 20, 2023

Tom7 is a national treasure.

loeg · on Feb 20, 2023

Indeed. https://news.ycombinator.com/item?id=34859300 :-)

imhoguy · on Feb 20, 2023

Now we need ytFS FUSE driver to random read these pattern videos. Anyone? ;)

paxys · on Feb 20, 2023

It isn't exactly a "glitch", just something Google doesn't care about (but absolutely will care about if too many people start doing it).

I remember way back in the day someone came up with a clever way of using Gmail attachments to build a cloud storage drive mounted to your filesystem. Then Google themselves released Drive soon after.

rwalle · on Feb 20, 2023

I doubt "too many people start doing it" is ever going to happen.

Obviously this is so difficult to use that most people would rather pay $10/month to get 1TB of storage that can be very easily accessed. Even if someone has 100TB of data and wants to back them up, I don't they would do conversion to and from YouTube videos.

An interesting idea, but probably won't get much real world use.

telotortium · on Feb 20, 2023

Pirates will take advantage of any suitably easy to use storage. I think YouTube is probably a poor target these days, though - Google's Denial of Service can probably detect something like this in pretty short order.

josephg · on Feb 20, 2023

You also run the risk of YouTube deleting your videos / banning your account. I’m sure they wouldn’t appreciate being used as a generic backup provider.

amelius · on Feb 20, 2023

Nice, until Google introduces a new compression algorithm that says: hey this looks like noise, let's replace it by this other user's noise so we can save on storage costs.

glasshug · on Feb 20, 2023

See, for example, film grain synthesis in AV1, which YouTube uses:

https://en.wikipedia.org/wiki/AV1#Filters https://norkin.org/pdf/DCC_2018_AV1_film_grain.pdf https://waveletbeam.com/index.php/news/48-netflix-film-grain...

ranting-moth · on Feb 20, 2023

I like the novelty of this project, but if you value your Google account I wouldn't try this out.

Google has been known to close accounts and "related" accounts for abuse (as defined by them). So even if you create another account, don't expect your main account to survive if there's any possible link between them.

They are the judge, jury and executor, so eff around at your own peril.

flatiron · on Feb 20, 2023

$20 a month gives you "unlimited" storage at google. they gladly take my encrypted files for years now and I'm up to 80TB. i think its more than reasonable to pay them for that type of service and be slightly above board (the account type i have says i need a minimum of 5 people but its just me).

sam_goody · on Feb 21, 2023

Which means that if for whatever the reason they decide to close your account - say one of the pictures in those 80TB triggers something that looks like CSAM [1] - and you are seriously up the creek.

Ditto if someone gets hold of your phone and changes the login on your account, or they decide to not let you in because something "looks suspicious".

You are brave. I hope, for your sake, you have a local backup.

[1]: https://9to5google.com/2022/08/22/google-locked-account-medi...

renonce · on Feb 20, 2023

How long does it take for you to download 80TB? From what I can see Google allows you to download 10TB per day but who knows when they will change that limit.

flatiron · on Feb 22, 2023

My work flow doesn’t have me redownloading the entire set. I have the drive mounted so it’s more light push and pull.

selectodude · on Feb 20, 2023

Even with a gigabit internet connection that would take a couple hundred hours.

ta1243 · on Feb 21, 2023

A useful rule of thumb to remember, 1gbit/second is about 10TB/day

coldblues · on Feb 20, 2023

That doesn't seem to be the case anymore. You have to pay for all the users to get the benefit of unlimited storage.

flatiron · on Feb 20, 2023

I must be grand fathered in. I pay $20 flat per month.

CTDOCodebases · on Feb 20, 2023

Are you paying month to month or yearly? I don't know if you can rely on that storage being available. See below.

https://www.zdnet.com/article/what-happens-to-your-g-suite-u...

flatiron · on Feb 22, 2023

I pay $20 a month

CTDOCodebases · on Feb 26, 2023

You could have got lucky but a bit risky to rely on this data being there IMHO.

deadfece · on Feb 20, 2023

$100/mo for absolutely unlimited is still an incredible bargain. $20/mo is in the neighborhood of almost free.

Dylan16807 · on Feb 20, 2023

A hundred dollars a month is only an incredible bargain if you have huge amounts of data.

The average person could buy a $100 external drive and replace it every five years, and that would be enough.

salawat · on Feb 20, 2023

$20/mo x 12 months = $240 annually.

$240 annual x 75 years = $18000

Almost free huh?

$12000 a year x 75 = $90000

If I could pay that in and lock it in for the duration, maybe I'd consider that, but no one is going to let you do that.

Y'all got some funny notions on "Free".

Then there's the whole issue of "What if Google gets bored?"

shiftpgdn · on Feb 20, 2023

Where did the $12,000/year come in?

ianburrell · on Feb 20, 2023

I think they messed up $240/yr * 5 people. Which is $1200/yr. Or $100/month * 12.

salawat · on Feb 20, 2023

Fack. $1200 a year x 75 years should be $90000 lifetime..

hungryforcodes · on Feb 20, 2023

Why 75 years?

salawat · on Feb 21, 2023

Previous average lifespan of a human being. Just needed a number to stop the analysis at. The one that comes bundled with the implication "Welp, I'm dead now" felt appropriate given that if you are dead, and the data is too hard to access, probate will likely be the end of your data storage foray. Any longer, and you're most certainly talking organizational scale preservation efforts.

alfnor · on Feb 21, 2023

If you make a Silicon Valley salary maybe, that is.

PYTHONDJANGO · on Feb 20, 2023

This information is wrong. Please give an URL to the service you are writing about to prove that it is right, thanks.

flatiron · on Feb 22, 2023

It is not wrong. I pay it every month.

https://workspace.google.com/pricing.html

The enterprise plan.

simonmysun · on Feb 24, 2023

From what I read they have a limit of 5TB per user. What region are you in? Could you provide an archive link or something to prove it?

www_harka_com · on Feb 20, 2023

Which service is that? Doesn't Workspace allow 1TB?

flatiron · on Feb 22, 2023

https://workspace.google.com/pricing.html

Enterprise is $20 (for me at least)

gekoxyz · on Feb 20, 2023

how do you manage the encryption/decryption?

willmorrison · on Feb 20, 2023

One option is Cryptomator: https://cryptomator.org/

trvz · on Feb 20, 2023

https://rclone.org/crypt/

flatiron · on Feb 22, 2023

I use Rclone but sadly started this process before it supported crypt so I use encfs

baal80spam · on Feb 20, 2023

Love rclone!

ranting-moth · on Feb 20, 2023

Borg backup

recuter · on Feb 20, 2023

https://diskprices.com

Price per TB appears to have fallen below $8. So that's $640 worth of storage. Basically, if you were to buy your own hard drives it works out to about $20/mo over two years..

Damogran6 · on Feb 20, 2023

I'm betting Google Storage is a little more fault tolerant...

manquer · on Feb 20, 2023

This particular account while loss making for them it is not by all that much.

A comparable Cloud Storage account on GCP with Coldline storage would be $320/month ($0.004 GB/month) or just $96/month for archival ($.0012/month).

The actual cost to Google is probably < $80/month for this 80TB ( most of the data is going to be in stored in a version of archival given the standard restrictions of 10TB on export.

80TB is also an heavy outlier, given the typical available bandwidth today and disk sizes commercially available for most users it will take a lot of dedicated investment of effort and time to upload this amount of data into the cloud.

Also Google's personal storage pricing is not competitive for pure storage, Backblaze is only $7/month for example. The higher price and value is derived from able to integrate into other Google products and provide storage for those like Gmail, Photos etc.

bornfreddy · on Feb 20, 2023

Depends on the fault. Disk errors, fire, theft? Yes. Account suspension? Hmmm...

Damogran6 · on Feb 21, 2023

or change in EULA!

recuter · on Feb 20, 2023

The other reply mentions backblaze. Whether you choose to use them or not their published driver statistics are quite useful:

https://www.backblaze.com/blog/backblaze-drive-stats-for-202...

A well chosen model has an AFR of well below 1%. To get about say, 100TB, you'd need a dozen drives or so with ZFS and a nice enclosure. It is unlikely even one of them will fail in a given year and you will not experience data loss.

Here is a $100 case: https://ja.aliexpress.com/item/1005003125774264.html

Here is some YouTuber shoving 100TB into it: https://www.youtube.com/watch?v=boKmZKTKXHc

Oxxide · on Feb 20, 2023

8 dollars for a TB of storage, man. It still makes me feel awestruck sometimes when I see stufff like a $23 3TB HDD.

mulmen · on Feb 20, 2023

You’re not accounting for redundancy, administration cost, electricity, heat management, or servers to hold the drives.

DonHopkins · on Feb 20, 2023

The downside of using YouTube for backups is that the comments on your backups are so toxic.

ornornor · on Feb 20, 2023

Don’t forget there is no appeal process (let alone the ability to talk to a human)

What a brave new world.

flangola7 · on Feb 20, 2023

This is starting to change. India has a new law requiring social media companies to have a grievance officer and a formal grievance process that allows users to speak to an actual human. It lays out a set of valid reasons to suspend a user, and cannot suspend or penalize a user for reasons not on the list, and must do so in a fair manner as prescribed by law. If the grievance process fails it can be appealed to a government office and then courts.

hungryforcodes · on Feb 20, 2023

Presumably even the BBC could use them...

nimbius · on Feb 20, 2023

Bold of you to assume hn hasn't fully convinced me to abandon everything but maps ;)

The 4x size increase is my biggest concern...too bloaty.

yootyootr · on Feb 20, 2023

Don't forget that YouTube compresses videos, so the extra filesize makes the videos resistant to that destructive process.

Ralo · on Feb 20, 2023

I wrote something just like this with Discord, and I even got it to host full videos which you can play back in browser. It's a good backup service. [0]

I want to expand this in into a fully modular service that you write payloads and scripts for various services, so when you upload a file its spread out across many different providers. When you're downloading, you just go down the list check what still exists, and verify the checksum. This should be stable for many years.

I plan to take a look into facebook and see what can/cant be accessed there. I had this exact thought with youtube and thought about using a pixel reader to exact out data. Same idea for different image hosting services like imgur.

[0] https://github.com/5ut/DiskCord

danuker · on Feb 20, 2023

The author says another Discord project served as inspiration: https://github.com/pixelomer/discord-fs

Maybe you could join forces.

j-krieger · on Feb 20, 2023

I've observed that with any piece technology where you're permitted to write / upload information and freely access it afterwards, someone will attempt to (ab)use it for file storage and write a blog article about it later :)

My favorite example of this was people storing files in "secret" subreddits by using posts and comments to store bytes. When they were later discovered by other users, the seemingly random strings sparked a huge conspiracy about their possible meaning.

However, you always have the problem that your unwilling host may remove your "files". I sometimes wonder about file storage using a textual output format that can't be distinguished from normal user interactions.

codetrotter · on Feb 20, 2023

I remember when GMail was by invite only, and at the time they were offering quite a larger amount of storage for Mail than anyone else so people started using their GMail drafts to store files.

That was the first time I can across such a thing.

Someone even made an extension for Windows XP that allowed you to mount GMail as a storage volume.

> GMail Drive is a Shell Namespace Extension that creates a virtual filesystem around your Google Mail account, allowing you to use Gmail as a storage medium.

http://www.viksoe.dk/code/gmail.htm

meltyness · on Feb 20, 2023

GmailFS was another early implementation.

drkstr · on Feb 20, 2023

Writing the GmailFS HOWTO, and fixing a bug in the process, was my first exposure to the power of OSS. Looking back, I'm pretty sure this is what led me to persue software engineering as a career!

saagarjha · on Feb 21, 2023

If you get a job at Google you might even recoup the costs of GmailFS ;)

phh · on Feb 20, 2023

> I sometimes wonder about file storage using a textual output format that can't be distinguished from normal user interactions.

You could use a reproducible LM (for instance using Bellard's NNCP as basis), and encode one bit in one word by taking the {first, second} most probable next word.

animuchan · on Feb 20, 2023

This is fascinating! And the file transfer can be then fully disguised as a conversation, with a ChatGPT-like client and all. An unsuspecting user will see a chat bot; a specialized client app would be able to receive files by talking to it.

INTPenis · on Feb 20, 2023

In this modern cloud-giant world it's abused for file storage yes. But I come from the more traditional web hosting world of the early 2000s and back then the general rule was that anything that could store information online would sooner or later be used to store porn.

petercooper · on Feb 20, 2023

How about storing your files in other people's DNS caches?

https://blog.benjojo.co.uk/post/dns-filesystem-true-cloud-st...

metadat · on Feb 20, 2023

Discussed 5 years ago:

https://news.ycombinator.com/item?id=16134041 (36 comments)

BlueTemplar · on Feb 20, 2023

In completely unrelated Hacker News :

"Ask HN: What are these strange random strings spamming my blog?"

https://news.ycombinator.com/item?id=34865695

coffeeblack · on Feb 20, 2023

Now I want to write a blog post about storing files inside of blog posts about storing files inside of blog posts …<error: recursion limit reached>

j-krieger · on Feb 20, 2023

This makes me think of Turing machines which store their own code inside them selves, which you can use for all kinds of interesting proofs. I wish I could find more about this.

t344344 · on Feb 20, 2023

Look into Squeak/Smalltalk. It is an operating system/desktop/IDE with self contained compiler.

diceduckmonk · on Feb 20, 2023

> I sometimes wonder about file storage using a textual output format that can't be distinguished from normal user interactions.

I guess it depends on what noise-to-signal density you’re after.

With a a long enough ChatGPT generated output, no one would question a few out of place characters or even an emoji. With 3000+ different emojis to choose from that encodes an entire byte of data.

Another idea is using “they’re”, “their”, “there” as bits.

sgerenser · on Feb 20, 2023

I vaguely recall some secretive company (perhaps Apple) using adjustment of spacing, capitalization, etc. to encode a unique serial number in messages sent by the CEO, which could then be used to trace leaks.

vlunkr · on Feb 20, 2023

Genius.com hid the message “red-handed” in Morse code using alternating quote characters to prove that Google was displaying their lyrics.

xen2xen1 · on Feb 20, 2023

CIA and similar have been doing that for something like 50 years. Made it into Tom Clancy novels in the 80s IIRC.

joaonmatos · on Feb 20, 2023

Elon Musk was the one claiming to do it

mattkrause · on Feb 20, 2023

Many people claim to do it.

It's a plot point in a Patriot Games, a 1987 Tom Clancy novel that introduced the term "canary trap" for this trick. He says he invented the term, but not the technique, which was already in use.

In a spat over the plot of Star Trek III (so, early 1980s), Harve Bennett distributed slightly different versions of the script, allowing him to track a leak back to Gene Roddenberry.

The book SpyCatcher says it was in routine use at MI-5, and you can find variations of it in lots of fiction too.

adolph · on Feb 20, 2023

> When they were later discovered by other users, the seemingly random strings sparked a huge conspiracy about their possible meaning.

Makes me wonder if numbers stations are actually just the worlds slowest modems

nullc · on Feb 20, 2023

Pretty easy to do that, use a fixed point implementation of GPT(N) of whatever size you like and range code your data into the model probabilities. This also will achieve a close to rate optimal embedding-- allowing you to embed about as much data as the language model thinks the text has...

If you encrypt the data and include a checksum or other identifying bytes in the ciphertext you can even have unwitting human participants in the discussions and if their posts are context your embedded data will be credible replies. You just have to be sure that threading behavior doesn't make it impossible to give the decoder identical context.

dtx1 · on Feb 20, 2023

> However, you always have the problem that your unwilling host may remove your "files". I sometimes wonder about file storage using a textual output format that can't be distinguished from normal user interactions.

Well, with Chat GPT that's almost trival. POC https://imgur.com/fQvMh9S

ed25519FUUU · on Feb 20, 2023

One of the things I’ve successfully used YouTube for was video storage of my security camera system. Unlimited video storage with a simple app to watch them in case I need to check something out!

And it’s simple: camera uploads automatically via FTP, inotifywait script uploads to google!

booi · on Feb 20, 2023

Shhhh.. don’t you know what the first rule about YouTube storage for security systems is??

vinay_ys · on Feb 20, 2023

Right this moment an engineer at Google is writing a personal OKR to block this and declare $$$ savings in order to get promoted next year.

teawrecks · on Feb 20, 2023

All they'd have to do is limit the amount of private videos you're allowed to store. If your only option for storing unlimited security footage is to make it public, then people probably wouldn't do that.

Alternatively, if they're allowed to use the footage to train some AI that will help them take over the world, then maybe they want all your random footage for free.

userbinator · on Feb 21, 2023

Security cameras are usually low-framerate and compress highly anyway, due to not much happening between each frame, so I doubt it's going to be a significant cost in comparison to all the other, far more massive content which is also constantly being uploaded.

throwaway71271 · on Feb 20, 2023

haha this is so cool, i made something similar https://punkjazz.org/scrambled-eggs/ few years ago to explore transferring files directly through the camera so nobody can "see" what you download, because no packets go through the internet, i managed to do 10kbps or so

the modern qr readers are so fast and easy to use, its unbelievable

pveierland · on Feb 20, 2023

Nice! It's such a neat way to transfer information :)

This guy extended the idea using fountain codes, which allows you to miss arbitrary frames and still recover the full message without waiting for the missed frames to re-appear:

https://divan.dev/posts/fountaincodes/

actionfromafar · on Feb 20, 2023

One could do side-channel sliding window hand shakes with audio, to improve download performance. :-)

throwaway71271 · on Feb 20, 2023

i was actually thinking about that, could be even more cool now with modern text to speech and whisper and some funky word based encoding with huge dictionary like:

teacher: 0b00010010101001, school: ...

and then the website can encode the data as a sentence and just text to speech it and the receiver can use whisper to speech to text and decode

will be the most creepy thing because it can be very steganographic and sound like a real sentence

ReflectedImage · on Feb 21, 2023

Yeah, who would abuse a free service to store their personal files?

Data Block: c3828abe

c5cfe61f4e61c9eda05e39903df580566859708a52957754e06fd18feaceca5ec0cdcac4b24b0f9ac8d9f212301916ea9ebcb2e291e2e950e0118f150c8cde02 34e770773cb93d6f1b757098890475cb00bef5ca4275c51021118ac1f01b71db3604063fd945480afc6b6b5b8d125129f7a9813a4997bdea27bbe5f6c17abfeb f46309c93430f78d37d23c0ef646cf7796e6de2b072d771b35b832a5b5328d1c09c5d32eaf6309b3119e8468ed02f62cd4b25c6785792ec82edc72667da8e36e 3b7b0d22fd708f5a3ff4787bf9474f84dff52fe33a38f4b4fee6759498b38d2c3af01db8d3dc5b1bb1cf6d203f24a4f6016caf42ad5cac76d1b0a0bf01a435b0 54a288c7cf9859dde401af51685eef23661ff0102a94caab2df9bf298c07538885baec81576513b9a7591d429db24b221c071cf0d929308243b0af4535810052

photochemsyn · on Feb 20, 2023

Video steganography might be a better approach and would be less likely to trigger account banning or claims of abuse by the hosters. The issue of avoiding data loss due to lossy compression algorithms seems to be an active area of research:

https://jis-eurasipjournals.springeropen.com/articles/10.118...

> "Moreover, most video-sharing channels transmit the steganographic video in a lossy way to reduce transmission bandwidth or storage space, such as YouTube and Twitter. . . Robust video steganography aims to send secret messages to the receiver through lossy channels without arousing any suspicions from the observer. Thus, the robustness against lossy channels, the security against steganalysis, and the embedding capacity are equally important."

I suppose in this project, the blocks of pixels are large enough to avoid data loss due to compression?

zxcvbn4038 · on Feb 20, 2023

You could do this with any service which accepts user content. You could have a tumblr blog focused on “paranormal phenomenon in white noise images” and fill it full of data embedded in images. If anyone ever asks you just explain that like many pattern illusions not everyone can see images contained within - try squinting, or covering the eye on the predominant side of your body, stand on your head, blah blah blah.

unregistereddev · on Feb 20, 2023

> fill it full of data embedded in images

This is even easier, because jpg's ignore additional data past the end of the file. Post a low-res ~200kb jpg that has an additional ~20mb of data appended. It'll still render perfectly fine.

AkshatJ27 · on Feb 20, 2023

Most platforms compress uploaded images, which would result in the appended data being removed.

zxcvbn4038 · on Feb 20, 2023

You could do the same thing with PNGs and different thunk types. Although in both cases you run a risk that some paranoid developer might filter out unexpected thunk types or additional data so in both cases it would be best to put the data in the image payload.

The other consideration is that Tumblr was always very “creator” oriented and while they might produce thumbnails of various sizes the original image is still available and not mangled by resizing algorithms. Other free image hosts are going to crush that image down the maximum amount tolerable to the human eye. Google even does that for paid photo hosting.

Wowfunhappy · on Feb 20, 2023

I understand that the goal is to make the data survive video compression, but wouldn't it make sense to use at least some color information instead of entirely black and white pixels?

LocalH · on Feb 20, 2023

Chroma is lossier than luma in most common video codecs. AVC is 4:2:0 on YouTube. 4:2:0, quite confusingly, means that chroma is halved in both dimensions compared to luma (so one chroma pixel is congruent with four luma pixels). As well, most decoders will apply filtering on the chroma to upsample it to match the luma, meaning that your color boundaries are going to be indistinct at best, and you might even lose the original chroma values entirely in the process. You'd have to use multiple chroma pixels as one metapixel in order to increase resilience, which would diminish the capacity. With modern codecs, a monochrome signal seems better to use for actual data, although I could see it being useful to use chroma for metadata.

Oxidation · on Feb 20, 2023

Seems like it could benefit from forward error correction to defend against bit errors (this is how QR codes survive big chunks being partially obscured or replaced by logos, and also how CDs survive being scratched within certain limits).

naikrovek · on Feb 20, 2023

that error correction greatly inflates the final size of the QR code, too.

there should be some error correction in a system like this, though.

Oxidation · on Feb 20, 2023

You can choose how much correction you get, in terms of how many bit errors you can correct per 'n' bits. And you need surprisingly few bits to get pretty great performance under "reasonable" bit-error rate channel (like under 10% overhead). You can wind up the strength of the error correction if you anticipate a noisier channel.

QR codes have 4 levels of correction you can use depending on how robust you wish them to be. CDs and DVDs use two chained, fixed, levels to keep the decoders simple. CDs have 25% overhead, but their correction is very strong: they can correct 4000 bits in a row.

trklausss · on Feb 20, 2023

Seems that you did not read the README:

There are two encoding modes, RGB and B/W. It uses a pixel-to-data width of 2x2, but says YouTube's compression algorithm is brutal, and one corrupted pixel already renders the whole thing corrupted.

Wowfunhappy · on Feb 20, 2023

Using full RGB clearly wouldn't work, but I do wonder if you could use the color channel for something, possibly redundancy information.

einr · on Feb 20, 2023

Very likely you could get away with at least 4 bits (16 distinct colors) per pixel, which is 4x more efficient than pure black and white.

LocalH · on Feb 20, 2023

I think the size of the effective chroma metapixels is more important than the range of values. You need to make them larger in order to keep the decoder from blending them together when upscaling the 4:2:0 chroma.

Now, if you're using a 4:4:4 format to do this, then you should be able to use smaller chroma metapixels (I still wouldn't use the full chroma resolution, though, unless you're using a high bitrate or a lossless codec). However, that risks data corruption if passed through a pipeline that downsamples the chroma.

cedws · on Feb 20, 2023

This reminds me of a stupid idea I had: would it theoretically be possible to store data using the backbone of the Internet itself? You'd bounce packets (probably TCP) back and forth between two hosts with bytes that aren't actually written to a disk anywhere so they just exist as a stream until one end decides to copy a section for itself.

tredre3 · on Feb 20, 2023

This isn't a new idea. It can be traced back to delay-line memory [1] and many thought experiments have been suggested to use a large network as such. Even some actual demos have been made [2][3].

1. https://en.wikipedia.org/wiki/Delay-line_memory

2. https://code.kryo.se/pingfs/

3. https://www.shysecurity.com/post/20120401-PingFS

josephg · on Feb 20, 2023

Suckerpunch did a video on “harder drives” where he implemented a block storage device by storing data in ping packets. It’s one of my all time favourite technical talks - his style is amazing, and he’s an incredible story teller.

https://youtu.be/JcJSW7Rprio

devnullbrain · on Feb 20, 2023

Am I correct in remembering a version of this video that existed before the COVID tests?

cowsup · on Feb 20, 2023

You’re effectively relying on two computers to be up and running 24/7. It’d be twice as better of an idea (which is still a very low number) to just store that data in RAM on a single device, rather than rely on two.

pcthrowaway · on Feb 20, 2023

This was posted 2 days ago also (but received very little attention): https://news.ycombinator.com/item?id=34850643

alephxyz · on Feb 21, 2023

Someone also posted a similar project here a few months back: https://news.ycombinator.com/item?id=31495049

NKosmatos · on Feb 20, 2023

This is bound to get you banned. I would do it a little bit more clever (with lower bitrate/throughput/storage sizes)...

Encode the data inside audio, preferrably outside human audible range, and then use a nice video of singing birds, or whales talking, and use the "hidden" frequencies to hide the data.

I don't know if Youtube has any filters that cut out frequencies, but this way they can't ban you, since you've uploaded a really nice personal video of your singing birds, instead of the conspicuous looking QR-like codes as in the OP ;-)

crazygringo · on Feb 20, 2023

> preferrably outside human audible range

With any lossy audio compression algorithm, everything outside the human audible range is filtered away completely as a first step. That's compression 101.

Also there's much less bandwidth in the audio channel than the video channel, and then far less again if you're trying to hide a signal in another signal.

fathyb · on Feb 21, 2023

Even without compression, most audio (including YouTube's) is sampled at 44.1kHz which filters anything >22kHz. https://en.wikipedia.org/wiki/Nyquist_rate

liamtuohyff · on Feb 21, 2023

Do this at you own risk, ive done this with lidar data (which didnt need to be as persice as binary, which is what im seeing in this post) which worked fine. 3 years later i revisited the project and it was broken because youtube compressed the files in such a way where it made the lidar just innaccurate enough to be unusable. I cant imagine storing data in binary where just one bit wrong screws everything

nomel · on Feb 21, 2023

I have many old videos that have lost their "HD" encoding, and now look like potato vision. I no longer (silly that I did) trust YouTube for video storage.

_vbnz · on Feb 20, 2023

Until Google bans your account completely across all services with no means for appeal.

brudgers · on Feb 20, 2023

Using video formats to store other data has a long history.

ADAT for example.

https://en.wikipedia.org/wiki/ADAT

woodruffw · on Feb 20, 2023

Nice work! I made a much worse variant of this years ago, with a “mosaic” mode[1]: whatever YouTube was doing for compression at the time handled multiple QRs tiled next to each other much better than it did a single large one.

[1]: https://github.com/woodruffw-hackathons/where-tube

albert_e · on Feb 20, 2023

Off topic

Does YouTube let you store unlimited video content (real video like screen recordings etc of our own work - no shady or sneaky stuff, nor any copyrighted stuff etc)

With all videos marked private ...so they are just "storage" by account owner and no other users can access them and youtube cannot monetize it ?

bombcar · on Feb 20, 2023

Apparently? We do a bunch of private videos for storage (many are also unlisted) and have no complaints.

I wouldn’t use it as my ONLY backup of course.

999900000999 · on Feb 20, 2023

There was a thread here a while back where someone lost years of corporate training content when YouTube deleted it.

I'd it's anything vital, as in your paycheck depends on it, I'd have multiple backups.

proxygeek · on Feb 20, 2023

Oh.... Verry Interesting!! Hoping someone has the answer here

andrewstuart · on Feb 20, 2023

Hmmm I’m not convinced.

I had a good look into these sorts of technologies but the host almost always changes the file so it makes it impossible to retrieve the data hidden in the file.

You need a file hosting platform that guarantees not to change the uploaded file.

How does this avoid such problems ?

TonyTrapp · on Feb 20, 2023

If you look at the example video, it doesn't depend on the video not being changed, but it does depend on a minimum level of quality. That is, as long as the video quality is high enough (720p in this case) to get back the original black and white pixels, you're fine. The data is not hidden, it's there in plain sight in the video.

andrewstuart · on Feb 20, 2023

OK I'm convinced. I like it!

manmal · on Feb 20, 2023

It‘s described in the README. The video has 2x2 pixel blocks that are either black or white, so each one signifies a bit. So a 1920x1080 frame encodes 518,400 bit = 64.8KB

The assumption is that video compression won’t mess up those blocks beyond recognition, so you should retain the information as long as the rendered resolution and bitrate don’t drop too low.

Maybe this could be improved by e.g. using 32 colors instead of 2, and bumping the block size to 3x3 (for safety) which should yield ca 144KB per frame.

LocalH · on Feb 20, 2023

The block size should honestly be tuned for the codec in use, chiefly to determine the best block size to fit with the codec's macroblock size. That's usually either 8x8, or with newer codecs 16x16. I feel like something like maybe 8x2 would be smart, and I like the idea of monochrome for resiliency, since chroma is downsampled. The fewer possible pixel combinations you have within a macroblock, the better the compression will probably end up being as well. And 8x2 would somewhat evoke the look of the old video backup systems as well, for the fun of the nostalgia of that.

warent · on Feb 20, 2023

Thank you for making this! I had the exact same idea quite some time ago but had neither the skills nor the passion to actually create it.

Seeing it come to life has just scratched a long forgotten itch and damn it feels great.

WaxProlix · on Feb 20, 2023

It'd be cool to add a FUSE wrapper around this. At one point I had a POC for a few of these sorts of things going (not as cool as this project, just data stored to X free cloud store/metadata) and creating a redundant transparent FUSE wrapper was probably the next step. With multiple sources, you could even treat mux data between slow/unreliable sources (content hosts in eg russia or asia) to 'stripe' the data. And then, you could make these modular so that new sources could be onboarded easily...

Yeah, I really like this stuff. Awesome project.

Thaxll · on Feb 21, 2023

This has been done many times in the past, one popular tool: https://github.com/dzhang314/YouTubeDrive

forgotpwd16 · on Feb 21, 2023

This was what reminded me. It was posted here on HN few months ago.

yeahbutiguess · on Feb 20, 2023

People do this all the time with any web connected service that accepts data. People use open strings in AWS services, like lambda function names, to store arbitrary bits.

klntsky · on Feb 20, 2023

> Unfortunately no filesystem functionality as of right now

I chuckled because of my own thought that seek (FS call) can be implemented via youtube video seeking

anonf0ld · on Feb 20, 2023

Using this can get your google account and related IP addresses banned? Isn't this sort of a Vandalism? But why attack Youtube out of all places? Do it to TikTok instead. They won't notice the difference(LOL). I would've said "delete this" normally but today's political climate demands more free space on the internet per individual definitely so...

Ardakilic · on Feb 20, 2023

Reminds me of Gmail Drive from years ago, where you could use your Gmail space as a virtual file system.

LinuxBender · on Feb 21, 2023

This looks like a fun project to tinker around with. I have one small request. The pack file in the repo is 146MB. Is there a way to make that smaller on github?

    pack-5d55e1f4809a8dae84591bc04b019b2ae8137f77.pack 146M

ChuckMcM · on Feb 20, 2023

Interesting to note that how black and white pixels get compressed less and thus make for easier restoration. It would be interesting to generate a 512x512 QR code that changes every 3 frames to see how well that would work for recovering data.

f_devd · on Feb 20, 2023

I wonder if you could get a better pixels/bit ratio when using DCT/2DFFT based encoding since you'd still encode lower frequency data but it would be in a format that compression algorithms would also try to maintain.

LastTrain · on Feb 20, 2023

" I still don't condone using this tool for anything serious/large. YouTube might understandably get mad."

I do love these kinds of hacks, but I hate these kind of weaselly cop-out statements. You made the tool, own it!

shultays · on Feb 20, 2023

Considering this might very well end up you losing your google account, it is a very necessary warning.

If anything, the author should be more clear about what happens if youtube gets mad: you might lose your google account along with access to mail, drive, photos etc

LastTrain · on Feb 21, 2023

The warning is fine, the warning while playing Pollyanna is annoying and disingenuous. To put it in a more constructive way - the author should be proud of the hack and all the fun and turmoil it could cause.

eternityforest · on Feb 22, 2023

That would be black hattery. They could get people banned. I'd prefer nobody being proud of that kind of thing.

charcircuit · on Feb 20, 2023

I feel like this is overcomplicating things. You should be able to download the original video you uploaded instead of downloading a compressed version. I'm sure the uncompressed version still exists.

LocalH · on Feb 20, 2023

Ensuring that you can retrieve the data from the viewable video means that this is also be a way of file transfer, one that other people won't be able to download the original video for.

up2isomorphism · on Feb 20, 2023

Dumb yet interesting idea, but if you care your data you should put it on google especially you are abusing their service, if you do not care why you even waste your time doing all this except for fun.

egorfine · on Feb 21, 2023

I immediately remembered https://en.wikipedia.org/wiki/ArVid

Patrickmi · on Feb 22, 2023

Or I could get the bytes data then create a really big hash table, boom have discovered a glitch in the simulation

okutanski · on Feb 20, 2023

This is hilarious

naikrovek · on Feb 20, 2023

"hilarious" is a bit strong. "interesting" feels better to me.

seanhunter · on Feb 21, 2023

The real trick would be to monetize the resulting videos so you get paid every time you access your storage.

layer8 · on Feb 20, 2023

It could probably be hidden in a normal-looking video using steganography. Lower effective bitrate of course.

egberts1 · on Feb 20, 2023

THAT is 1337!

A true hacker spirit worthy of Captain Crunch whistle and its application toward free payphone calls.

rldjbpin · on Feb 21, 2023

i would've save this "improvement" for my own project, but consider using colours instead of just black and white squares for higher data density! i am unsure how much compression can affect its effectiveness.

lionkor · on Feb 21, 2023

> written entirely in Rust my beloved

jesus christ enough with the jerking

AtlasBarfed · on Feb 20, 2023

This would be really impressive with some stenography.

lousken · on Feb 20, 2023

if only somebody wrote client for personal backblaze backup (which is also unlimited), can't easily store terabytes from my linux PCs

FloatArtifact · on Feb 20, 2023

https://github.com/tom300z/backblaze-personal-wine

Gordonjcp · on Feb 21, 2023

"You want to try some snowcrash?"

skwheel · on Feb 20, 2023

after your finals, you should read about forward error correction.

wigster · on Feb 20, 2023

they're here...

nice end of transmission simulator to boot!

2h · on Feb 20, 2023

please find better uses of your time. this is such an obvious abuse.