Hacker News new | past | comments | ask | show | jobs | submit login

> Cleaning it all up has been on my to-do list for the past 5 years

13 years and counting. But I’m sure I’ll eventually clean mine up




Take a look at aifiles: https://github.com/jjuliano/aifiles

Getting vicuna or alpaca for this could be the best decision for those that want to keep their data.

Could you imagine the space saving you can achieve by a system that constructs a real normalized duckdb database with zstd compression and join tables and all from your big dump of tar.xml.gz files? Automagically converting all of your media to AV1 and Opus to save space and remove any private codec reqs? Clear collation with directory choices similar to Linux style?

https://github.com/jjuliano/aifiles seems like one of the best ideas for data organization - just needs some polishing and local-only models


I just have to keep it all until I retire. I’m sure I’ll get around to it then.


Pro tip: I have noticed that most retired people seem to have even less time in their schedule. So I would not count on it :-)


Lol, yeah, my 90 year old grandmother was also complaining about that. But if it turns out like that I’ll happily take it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: