Demystifying terminology behind LLMs
Introduction to Autoregressive Decoding
The architecture for a Large Language Model
The innards of a transformer layer
How to find the needle in the haystack
Connection to papers