
Why Does Your React App Stutter During LLM Streaming?
Stop letting erratic token bursts ruin your user experience with a dive into state batching and partial-content rendering strategies.
You’ve seen it: that little gray cursor blinking along as an LLM generates a response, only your entire browser tab feels like it’s wading through waist-deep molasses. You’re just appending text to a string, right? How hard can it be?
As it turns out, streaming text from an AI is a worst-case scenario for the React reconciliation engine. When you're receiving 30 to 60 tokens per second—which is common with fast providers like Groq or Together—you aren't just updating a string. You are triggering a full component re-render, a DOM update, and a layout recalculation dozens of times every single second.
If your UI feels "stuttery" or your cooling fans are starting to scream, you’re likely falling into one of these three traps.
The Firehose Problem
Most developers start with something that looks like this:
// Don't actually do this for high-speed streams
const [completion, setCompletion] = useState("");
const handleStream = async (reader) => {
while (true) {
const { done, value } = await reader.read();
if (done) break;
const token = new TextDecoder().decode(value);
// 🚩 This triggers a re-render for EVERY single token
setCompletion((prev) => prev + token);
}
};The problem here is frequency. React 18 is smart about batching, but it can’t save you from a while loop that bypasses the event loop's natural breathing room. You are effectively DOS-ing your own main thread. While the JavaScript engine is busy processing your state update, it can't handle user clicks or smooth CSS animations.
Strategy 1: The Accumulator Buffer
The quickest win is to stop being so honest with your UI. Your user doesn't need to see the string grow every 10 milliseconds; their eyes literally cannot process text that fast.
Instead, use a "buffer" and a controlled interval to flush that buffer into your state.
import { useRef, useState, useEffect } from 'react';
export function useStreamedResponse() {
const [text, setText] = useState("");
const bufferRef = useRef("");
// We use a ref to track the "real-time" string
// without triggering renders
const streamRef = useRef("");
const startStreaming = async (stream) => {
const reader = stream.getReader();
// Create a periodic flush to update the UI at 60fps (max)
const interval = setInterval(() => {
if (bufferRef.current !== "") {
setText((prev) => prev + bufferRef.current);
bufferRef.current = "";
}
}, 80); // ~12 updates per second is plenty for the human eye
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
bufferRef.current += new TextDecoder().decode(value);
}
} finally {
clearInterval(interval);
// Final flush
setText((prev) => prev + bufferRef.current);
}
};
return { text, startStreaming };
}By throttling the state updates to ~80ms, you give the browser back about 90% of its idle time to handle things like scrolling and hover effects.
The Markdown Tax
The stuttering often gets worse if you’re using something like react-markdown. Markdown parsers are usually designed to take a full string and turn it into a tree of components.
When you stream into a Markdown component, you aren't just adding a character; the parser has to re-parse the *entire* conversation from scratch on every single update. If your AI response is 2,000 words long, by the end of the stream, your CPU is re-calculating the syntax highlighting for the first paragraph for the 500th time.
How to fix the "Parsing Stutter":
1. Memoize heavily: Wrap your custom Markdown components (like code blocks) in React.memo.
2. CSS is your friend: Use white-space: pre-wrap; for the raw stream and only swap to the full Markdown-parsed version once the stream is finished (or if the user hits "pause").
3. Virtualization: If this is a chat app, ensure previous messages are virtualized or at least wrapped in React.memo so they don't re-render when the *current* message grows.
Concurrent React to the Rescue?
You might be tempted to wrap your setCompletion in startTransition.
import { useTransition } from 'react';
const [isPending, startTransition] = useTransition();
// Inside the stream loop...
startTransition(() => {
setCompletion(c => c + token);
});This tells React that the text update is a "low priority" transition. It helps keep the UI responsive (the "Stop" button will still work), but it doesn't solve the underlying issue of the sheer volume of work. In fact, it might actually make the text look *laggier* because React will purposefully delay the rendering of the text to keep the input fields snappy.
I’ve found that the Buffer Strategy mentioned above almost always yields a smoother visual experience than relying on Concurrent mode alone.
The "Bottom of the Chat" Jitter
Finally, let's talk about scrollToBottom. If you’re triggering a window.scrollTo or el.scrollIntoView every time a token arrives, you are forcing the browser to perform a "Reflow." This is the most expensive operation a browser can do.
Instead of scrolling on every token, only scroll if the user is already near the bottom, and use the same throttled interval we used for the text update.
// Simple logic to keep the user anchored
useEffect(() => {
if (isUserAtBottom()) {
chatContainerRef.current.scrollTop = chatContainerRef.current.scrollHeight;
}
}, [text]); // 'text' is already throttled!Summary
The "stutter" isn't a limitation of React; it's a conflict between how fast LLMs can talk and how fast the DOM can listen. By buffering your updates and being mindful of expensive Markdown re-parsing, you can go from a janky, high-CPU mess to a buttery-smooth AI experience.
Don't let your tokens move faster than your frames. Buffer them, render them in chunks, and your users (and their laptop batteries) will thank you.

