Getting vicuna or alpaca for this could be the best decision for those that want to keep their data.
Could you imagine the space saving you can achieve by a system that constructs a real normalized duckdb database with zstd compression and join tables and all from your big dump of tar.xml.gz files? Automagically converting all of your media to AV1 and Opus to save space and remove any private codec reqs?
Clear collation with directory choices similar to Linux style?
https://github.com/jjuliano/aifiles seems like one of the best ideas for data organization - just needs some polishing and local-only models
13 years and counting. But I’m sure I’ll eventually clean mine up