Attention Visualizer — Playground

What it does

Loads any model from Hugging Face and renders the attention weights produced for a prompt you control. Built for the moment when you’re trying to convince yourself that a specific head is doing what you think it’s doing.

Why it’s useful

Reading the attention matrix as raw numbers is hopeless. Reading it as a heatmap with the tokens labeled along both axes makes patterns jump out — causal triangles, sink tokens absorbing residual mass, induction heads aligning along the off-diagonal. A few minutes here often saves hours of grepping through paper figures.

How to use it

Paste a model ID from Hugging Face (e.g. meta-llama/Llama-3.1-8B).
Type a prompt — short ones make the visualization legible.
Pick a layer and head from the sidebar.
Watch the matrix render. Hover any cell to see the query/key tokens.

Limitations

Currently runs on a backend GPU, so very large models may queue.
Multi-query and grouped-query attention are visualized per query head; KV-shared heads appear repeated.
This tool is for understanding, not for high-throughput batch analysis.