← All contributors
MH
contributor
Mira Holst
@mira
Inference acceleration via speculative decoding and tree-based sampling. Interested in the systems-side of decoding — verification kernels, draft-model selection, throughput under real load. Currently exploring multi-draft and Medusa-style heads.
1 article
Inference focus
1 article