Abstract: The key-value (KV) cache in large language models (LLMs) now necessitates a substantial amount of memory capacity as its size proportionally grows with the context’s size. Recently, ...