1. A thread explaining the internal working of transformers: https://twitter.com/hippopedoid/status/1641432291149848576?s...
2. Paper by DeepMind which provides pseudo-code for important algorithms for Transformer models: https://arxiv.org/pdf/2207.09238.pdf
3. Another thread specifically on large language models: https://twitter.com/cwolferesearch/status/164044611134855577...
Once again these are not courses per se, but do provide intuitive explanations for how transformers work. There is also the nanoGPT series of videos by Karpathy on youtube. First video here: https://www.youtube.com/watch?v=kCc8FmEb1nY
1. A thread explaining the internal working of transformers: https://twitter.com/hippopedoid/status/1641432291149848576?s...
2. Paper by DeepMind which provides pseudo-code for important algorithms for Transformer models: https://arxiv.org/pdf/2207.09238.pdf
3. Another thread specifically on large language models: https://twitter.com/cwolferesearch/status/164044611134855577...
Once again these are not courses per se, but do provide intuitive explanations for how transformers work. There is also the nanoGPT series of videos by Karpathy on youtube. First video here: https://www.youtube.com/watch?v=kCc8FmEb1nY