Building Smooth Chat Interfaces with OpenAI Responses API

By Vance Denson
Building Smooth Chat Interfaces with OpenAI Responses API

Building Smooth Chat Interfaces with OpenAI Responses API

I created OpenAI chatbot application for you all to talk with my resume, see the live demo HERE. It's a nice experience and quick to prototype with, it also provides a perfect streaming response experience, enjoy my explaination below! - Vance.

The OpenAI Responses API provides a powerful way to build AI-powered chat interfaces with streaming support and prompt templates. This post explores the API's key features and the design patterns that enable smooth, responsive chat experiences.

Understanding the Responses API

The Responses API is OpenAI's latest approach to building conversational AI applications. Unlike the Chat Completions API, it's designed around prompt templates and response objects, providing more structure and control over conversations.

Key API Options

const requestParams = {
  model: 'gpt-4o',
  temperature: 0.7,
  max_output_tokens: 1000,
  stream: true,
  prompt: {
    id: 'pmpt_69260b7e05408197951e2852cb1980d101e58c8cef3159d9',
    version: '6',
    variables: {
      user_prompt: input,
    },
  },
  text: {
    format: {
      type: 'text',
    },
  },
};

Key Parameters:

  • prompt: Reference to a pre-configured prompt template with variables
  • stream: Enable Server-Sent Events (SSE) streaming
  • temperature: Controls response randomness (0-2)
  • max_output_tokens: Limit response length
  • text.format: Specify output format (text or JSON schema)

The Prompt Template Pattern

Instead of sending full conversation histories, the Responses API uses prompt templates stored in OpenAI's system. This pattern offers several advantages:

Benefits

  1. Centralized Prompt Management: Update prompts without code changes
  2. Version Control: Track prompt versions (version: '6')
  3. Variable Injection: Pass dynamic values via variables
  4. Consistency: Same prompt structure across requests

Implementation

// API Route: /api/chat/route.ts
const requestParams = {
  model: options?.model || 'gpt-4o',
  stream: true,
  prompt: {
    id: 'pmpt_69260b7e05408197951e2852cb1980d101e58c8cef3159d9',
    version: '6',
    variables: {
      user_prompt: input, // User input injected here
    },
  },
  text: {
    format: { type: 'text' },
  },
};
 
const response = await openai.responses.create(requestParams);

Streaming with Server-Sent Events

Streaming is essential for responsive chat interfaces. The Responses API supports SSE streaming, delivering tokens as they're generated.

Server Implementation

// Convert OpenAI stream to SSE format
const stream = new ReadableStream({
  async start(controller) {
    const encoder = new TextEncoder();
    const streamResponse = response as AsyncIterable<any>;
 
    for await (const chunk of streamResponse) {
      const data = JSON.stringify(chunk);
      controller.enqueue(encoder.encode(`data: ${data}\n\n`));
    }
 
    controller.enqueue(encoder.encode(`data: {"type":"done"}\n\n`));
    controller.close();
  },
});
 
return new Response(stream, {
  headers: {
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache',
    Connection: 'keep-alive',
  },
});

Stream Event Types

The API emits different event types during streaming:

  • response.output_text.delta: Text chunks (new format)
  • response.output_item.delta: Legacy nested format
  • response.output_item.added: New output item
  • response.created: Response initialization
  • response.done: Stream completion

Frontend Streaming Handler

The frontend processes SSE streams and updates the UI in real-time:

// Client-side stream processing
const reader = response.body?.getReader();
const decoder = new TextDecoder();
let buffer = '';
let currentText = '';
 
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
 
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() || '';
 
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
 
      if (data.type === 'response.output_text.delta') {
        // Direct text delta format
        if (data.delta && typeof data.delta === 'string') {
          currentText += data.delta;
          setMessages((prev) =>
            prev.map((msg) =>
              msg.id === assistantMessageId
                ? { ...msg, content: currentText }
                : msg
            )
          );
        }
      } else if (data.type === 'response.done') {
        setIsStreaming(false);
      }
    }
  }
}

Design Patterns for Smooth UX

1. Optimistic UI Updates

Create assistant message placeholders immediately:

// Add placeholder before API call
const assistantMessage: ChatMessageType = {
  id: `assistant-${Date.now()}`,
  role: 'assistant',
  content: '',
  timestamp: new Date(),
};
setMessages((prev) => [...prev, assistantMessage]);

2. Incremental Text Updates

Update message content incrementally as tokens arrive:

// Accumulate text and update UI
currentText += data.delta;
setMessages((prev) =>
  prev.map((msg) =>
    msg.id === assistantMessageId
      ? { ...msg, content: currentText }
      : msg
  )
);

3. Request Cancellation

Allow users to cancel in-flight requests:

const abortControllerRef = React.useRef<AbortController | null>(null);
 
// Cancel previous request
if (abortControllerRef.current) {
  abortControllerRef.current.abort();
}
abortControllerRef.current = new AbortController();
 
const response = await fetch('/api/chat', {
  signal: abortControllerRef.current.signal,
});

4. Error Handling

Gracefully handle stream errors:

recognition.onerror = (event: SpeechRecognitionErrorEvent) => {
  if (event.error === 'not-allowed') {
    // Permission denied - silently fail
    console.log('Microphone permission denied');
  }
  setIsListening(false);
};

Complete Flow

User Input → API Route → OpenAI Responses API
                ↓
         Prompt Template (with variables)
                ↓
         Streaming Response (SSE)
                ↓
         Frontend Stream Parser
                ↓
         Incremental UI Updates
                ↓
         Smooth Chat Experience

Key Takeaways

  1. Prompt Templates: Use centralized, versioned prompts with variable injection
  2. Streaming: Implement SSE for real-time token delivery
  3. Optimistic Updates: Show placeholders immediately for better perceived performance
  4. Error Handling: Gracefully handle permission and API errors
  5. Cancellation: Support aborting in-flight requests

The Responses API's prompt template pattern simplifies prompt management while streaming ensures responsive user experiences. Combined with proper frontend handling, this creates smooth, production-ready chat interfaces.