← All contributors
contributor

Mira Holst

@mira

Inference acceleration via speculative decoding and tree-based sampling. Interested in the systems-side of decoding — verification kernels, draft-model selection, throughput under real load. Currently exploring multi-draft and Medusa-style heads.

1 article
Inference focus
1 article