And if you want to understand I'd recommend this post (gpt2 in 60 lines of numpy) and the post on attention it links to. The concepts are mostly identical to llama, just with a few minor architectural tweaks. https://jaykmody.com/blog/gpt-from-scratch/