Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Take a look at aifiles: https://github.com/jjuliano/aifiles

Getting vicuna or alpaca for this could be the best decision for those that want to keep their data.

Could you imagine the space saving you can achieve by a system that constructs a real normalized duckdb database with zstd compression and join tables and all from your big dump of tar.xml.gz files? Automagically converting all of your media to AV1 and Opus to save space and remove any private codec reqs? Clear collation with directory choices similar to Linux style?

https://github.com/jjuliano/aifiles seems like one of the best ideas for data organization - just needs some polishing and local-only models




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: