contributor

Mira Holst

@mira

Inference acceleration via speculative decoding and tree-based sampling. Interested in the systems-side of decoding — verification kernels, draft-model selection, throughput under real load. Currently exploring multi-draft and Medusa-style heads.

1 article

Inference & Serving focus

1 article

Speculative decoding without the speculation

Draft models work. They also fail in ways the original papers didn't surface. A small bag of tricks for keeping acceptance rates high in real workloads.

Mira Holst Inference & Serving

Aug 17, 2026
16 min