Machine learning,
from to clusters.

An open community for learning, writing, and tinkering on the infrastructure behind modern AI — inference engines, training systems, ml stacks, and everything in between.

Read the archive →Write with us

articles

contributors

FIG. 1.1Per-head attention pattern at layer 14. Causal mask + induction circuit + sink token, visualized over one sentence.

Latest

Recently published.

All articles →

Architecture

How a Transformer Really Works: Attention, the KV Cache, and Why Inference Eats Memory

A from-scratch tour of what's actually inside an LLM: how a transformer turns tokens into predictions, what Query, Key, and Value really mean, and how generating text one token at a time builds the KV cache — the growing pool of memory that makes inference so expensive.

DineshJul 21, 2026 · 10 min

Training Systems

Every Mask in a Transformer, Untangled

The word "mask" means at least four unrelated things in deep learning — what a token can see, what counts toward the loss, what is hidden to create a task, and what is randomly dropped. One field guide to all of them, with why each exists and what breaks without it.

DineshJul 20, 2026 · 9 min

Training Systems

Intuitive Guide to LoRA: Fine-Tuning a Model by 0.2% of weights

You don't need a massive tech budget or a cluster of high-end GPUs to train your own AI. LoRA allows developers to fine-tune giant models right on a standard laptop. Here is the zero-jargon, first-principles explanation of the clever shortcut that leveled the playing field.

DineshJul 19, 2026 · 12 min

Architecture

Neural Networks From Zero: From a Single Number to a Billion Parameters

A neural network never sees a word, an image, or a sound — only a list of numbers. Starting from that one fact and a single neuron, this guide builds the whole machine: how any input becomes numbers, why weights, biases, and activations each exist, and how neurons stack into layers and layers into a model.

DineshJul 12, 2026 · 14 min

Index

Browse by topic.

Full index →

Tools

Run the math yourself.

All tools →

Discourse

The conversation.

Where to talk →

Forum Discussions

Ask questions, share what you built, run a poll, or just discuss ML systems.

Open the Forum →

Discord

Real-time chat for the working day. Quick questions, debugging help, paper club, and the occasional argument about whether MoE is overrated.

Join the server →

Machine learning,
from to clusters.

Recently published.

How a Transformer Really Works: Attention, the KV Cache, and Why Inference Eats Memory

Every Mask in a Transformer, Untangled

Intuitive Guide to LoRA: Fine-Tuning a Model by 0.2% of weights

Neural Networks From Zero: From a Single Number to a Billion Parameters

Browse by topic.

Every Mask in a Transformer, Untangled

Intuitive Guide to LoRA: Fine-Tuning a Model by 0.2% of weights

How a Transformer Really Works: Attention, the KV Cache, and Why Inference Eats Memory

Neural Networks From Zero: From a Single Number to a Billion Parameters

Run the math yourself.

Attention Visualizer

Throughput Calculator

Training Memory Calculator

Eval Harness Playground

Model Card Generator

Kernel Benchmark

The conversation.

Forum Discussions

Discord

Share knowledge that
moves the field forward.

Machine learning,from kernels to clusters.

Recently published.

How a Transformer Really Works: Attention, the KV Cache, and Why Inference Eats Memory

Every Mask in a Transformer, Untangled

Intuitive Guide to LoRA: Fine-Tuning a Model by 0.2% of weights

Neural Networks From Zero: From a Single Number to a Billion Parameters

Browse by topic.

Every Mask in a Transformer, Untangled

Intuitive Guide to LoRA: Fine-Tuning a Model by 0.2% of weights

How a Transformer Really Works: Attention, the KV Cache, and Why Inference Eats Memory

Neural Networks From Zero: From a Single Number to a Billion Parameters

Run the math yourself.

Attention Visualizer

Throughput Calculator

Training Memory Calculator

Eval Harness Playground

Model Card Generator

Kernel Benchmark

The conversation.

Forum Discussions

Discord

Share knowledge thatmoves the field forward.

Machine learning,
from to clusters.

Share knowledge that
moves the field forward.