PagedAttention is a good idea poorly understood. It works well in the small, but at 10,000 concurrent requests it surfaces second-order effects that the original paper doesn’t discuss — fragmentation, page-swapping pathologies, cache thrash.
A primer plus the empirical notes from running it at production scale.