They clearly expected the leak, they distributed it very widely to researchers. The important thing is the licence, not the access: you are not allowed to use it for commercial purpose.
You are certainly partly right, but it's also about liability. Those models might output copyrighted information, which Facebook doesn't want to get sued about. So they restrict the model for research. If someone uses it to replicate copyrighted work, they are not responsible.
Open AI faces the same liability concerns though. I think IP concerns are low on the list given past success of playing fast and loose on emergent capabilities of new tech platforms.
For example, WhatsApp’s greyhat use of smartphone address book.
The US government also has a stake in unbridled growth seems, in general, to give a pass to business exploring new terrain.
Have reasonable suspicion, sue you, and then use discovery to find any evidence at all that your models began with LLaMA. Oh, you don't have substantial evidence for how you went from 0 to a 65B-parameter LLM base model? How curious.
Same way anti-piracy worked in the 90s: cash payouts to whistleblowers. Yes, those whistleblowers are guaranteed to be fired employees with an axe to grind.
LLaMa uses books3 which is a source of pirated books, to train the model.
So either, it is very hypocrite of them to apply DCMA while the model itself is illegal. Or, they are trying to somewhat stop spreading as they know it is illegal.
Anyways, since the training code and data sources are opensource, you 'could' have trained it yourself. But even then, you are still at risk for the pirated books part.