Written by Amber Jain

AI / LLMs
Local LLMs

Understanding LLM Size, Weights, Parameters, Quantization, KV Cache & Inference Memory

How much RAM do you need to run a 30 billion parameter model? Why are there multiple versions of the same model at different file sizes? What does "8-bit quantization" actually mean, and how does it affect performance and/or precision? If you're running language models locally or planning to, understanding the relationship between parameters, weights, quantization, and memory is essential.