They recommend against it for the traditional CC licenses (CC-BY-SA, etc.), but CC0 is perfectly fine for code, since it's basically a dedication to the public domain for jurisdictions that don't support that. The FAQ for CC0 says this explicitly: https://wiki.creativecommons.org/wiki/CC0_FAQ#May_I_apply_CC...
Unfortunately the term "open-source" AI alone is meaningless without knowing the users definition of open-source. More often than not it just means "openly available". In this case for example they could be using Whisper, which is openly available, but from OpenAI and still has all the caveats around used data, resource usage for training etc.
The JSON files still contain images, just not in a regular image format.
You have a 2D array of numbers where each number maps to a color.
If you really want a regular picture format, you can easily convert the arrays.
I mean I think it's fair to assume that TEST-Driven-Development has something to do with testing. That being said, Kent Beck recently (https://tidyfirst.substack.com/p/tdd-outcomes) raised a point saying TDD doesn't have to be just an X technique, which I wholeheartedly agree with.
I like Data-Oriented Design, but beware of one thing: You organise your data like a database? You'll eventually be writing a database management system, unless you can use a framework like one of the many Entity-Component-System ones.
[1] https://creativecommons.org/faq/#can-i-apply-a-creative-comm...