Streaming LLM Responses in a React Frontend: A Complete Pipeline Guide

React chat interface showing streaming LLM text response in real time

When you chat with ChatGPT, the text appears word by word in real time. Users love this because it feels responsive and natural. If your React app calls an LLM API and waits for the entire response before showing anything, users stare at a loading spinner for 5 to 30 seconds. Streaming fixes this by displaying each token the moment it arrives from the API.

How LLM Streaming Works

Most LLM APIs like OpenAI, Anthropic, and Google Gemini support streaming responses. Instead of returning one big JSON blob, the API sends back small chunks of text as the model generates them. The most common transport methods are Server-Sent Events (SSE) and chunked HTTP responses. Your frontend reads these chunks as they arrive and appends them to the displayed text.

Setting Up the Backend Proxy

Never call LLM APIs directly from your React frontend — that exposes your API keys. Create a simple backend route (using Next.js API routes, Express, or any server) that receives the user prompt, calls the LLM API with streaming enabled, and forwards the stream to your frontend. In Next.js, you can use a Route Handler that returns a ReadableStream. The backend acts as a secure proxy between your users and the LLM.

The React Streaming Hook

Create a custom React hook called useStreamingResponse. It takes a prompt, sends a fetch request to your backend with the ReadableStream option, and reads chunks from the response body using a reader. As each chunk arrives, decode it with TextDecoder and update a state variable. The component re-renders on every chunk, creating the live typing effect. Add an abortController so users can cancel long responses.

  • Create a fetch request to your backend endpoint with signal from AbortController
  • Get a reader from response.body using getReader()
  • Read chunks in a while loop using reader.read() until done is true
  • Decode each chunk with TextDecoder and append to the message state
  • Handle errors and cleanup the reader when the component unmounts
Code editor showing React streaming hook implementation

Building the Chat UI Component

The chat component renders a list of messages. Each message has a role (user or assistant) and content. While the assistant message is streaming, show a blinking cursor at the end of the text. Use a ref to auto-scroll the message container to the bottom as new text arrives. Add a text input at the bottom with a send button that calls the streaming hook when submitted.

Handling Edge Cases

Real-world streaming needs solid error handling. If the stream disconnects mid-response, show a retry button. If the user navigates away during streaming, abort the request to avoid memory leaks. If the API returns an error chunk, parse it and display a friendly message instead of crashing the UI. These details separate a demo from a production-ready chat interface.