Thinking Machines: Defeating Nondeterminism in LLM Inference
The first post from Thinking Machines - the new post-OpenAI endeavour by Mira Murati and others. Pretty into the nuts and bolts, but the main point:
In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies!