Other thing I'm worried about: how to retract facts from ML model? I guess it's impossible, you need to retrain from scratch with part X removed from training set. Or... people could invent layered ML models similar to docker - each layer would be marked what data it was trained with. Then at least you'd have some cache of trained model to re-use in next training session. Nasty stuff.
Or instead of inventing complicated layered ML models Github could just use each repo's license information to decide what's okay to use. Detecting licenses is already a feature on that site.