Realtime Package

npm downloads

The @hazeljs/realtime package provides low-latency voice AI with the OpenAI Realtime API. Connect via WebSocket for speech-to-speech conversations with sub-second latency — no separate STT → LLM → TTS pipeline.

Purpose

Building voice AI applications typically requires stitching together speech-to-text, an LLM, and text-to-speech — each adding latency and complexity. The @hazeljs/realtime package simplifies this by providing:

  • Speech-to-Speech — Native voice in, voice out — no intermediate text step
  • Low Latency — Sub-second response via WebSocket to OpenAI Realtime API
  • WebSocket Integration — Built on @hazeljs/websocket with @Realtime decorator
  • Configurable — Instructions, voice, output modalities per session
  • Bidirectional — Proxy client ↔ OpenAI; send audio, receive audio + text
  • Event-Driven — Forward any OpenAI Realtime client/server events

Architecture

graph TD
  A["Client<br/>(Browser, Mobile)"] -->|WebSocket| B["RealtimeGateway"]
  B --> C["RealtimeService"]
  C --> D["OpenAIRealtimeSession"]
  D -->|wss://api.openai.com/v1/realtime| E["OpenAI Realtime API"]
  E -->|Audio + Text| D
  D --> C
  C --> B
  B --> A
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style C fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style D fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style E fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

Key Components

  • RealtimeGateway — WebSocket gateway that proxies client connections to OpenAI
  • RealtimeService — Manages sessions, creates OpenAI connections per client
  • OpenAIRealtimeSession — Per-client session that forwards events bidirectionally
  • Auto-attach — Gateway attaches to HTTP server automatically via OnApplicationBootstrap

Advantages

1. Single Pipeline

No separate STT, LLM, and TTS services — one WebSocket connection handles everything.

2. Sub-Second Latency

Direct streaming to OpenAI Realtime API eliminates round-trip delays between pipeline stages.

3. Zero Boilerplate

Register RealtimeModule.forRoot() and the gateway attaches automatically when the app starts.

4. Flexible Output

Receive both audio and text in the same stream — use what you need for your UI.

5. Production Ready

Built on HazelJS WebSocket infrastructure with proper connection lifecycle, error handling, and session management.

Installation

npm install @hazeljs/realtime @hazeljs/core @hazeljs/websocket

Environment

Set OPENAI_API_KEY or pass openaiApiKey in RealtimeModule.forRoot().

Quick Start

1. Register Realtime Module

// app.module.ts
import { HazelModule } from '@hazeljs/core';
import { RealtimeModule } from '@hazeljs/realtime';

@HazelModule({
  imports: [
    RealtimeModule.forRoot({
      openaiApiKey: process.env.OPENAI_API_KEY,
      path: '/realtime',
      defaultSessionConfig: {
        instructions: 'You are a helpful voice assistant. Speak clearly and briefly.',
        voice: 'marin',
        outputModalities: ['audio', 'text'],
      },
    }),
  ],
})
export class AppModule {}

2. Bootstrap

// main.ts
import { HazelApp } from '@hazeljs/core';
import { AppModule } from './app.module';

async function bootstrap() {
  const app = new HazelApp(AppModule);
  const port = parseInt(process.env.PORT ?? '3000', 10);

  await app.listen(port);

  console.log(`Realtime voice AI at ws://localhost:${port}/realtime`);
}

bootstrap().catch(console.error);

The RealtimeGateway is automatically attached to the HTTP server when the app starts listening (via OnApplicationBootstrap).

3. Connect from Client

const ws = new WebSocket('ws://localhost:3000/realtime');

ws.onopen = () => {
  // Optional: update session config
  ws.send(JSON.stringify({
    type: 'session.update',
    session: { instructions: 'Be extra friendly!' },
  }));
};

ws.onmessage = (e) => {
  const { event, data } = JSON.parse(e.data);
  if (event === 'realtime') {
    if (data.type === 'response.output_audio.delta') {
      // Play base64 PCM: data.delta
    }
  }
};

// Send audio (base64 PCM 24kHz)
ws.send(JSON.stringify({
  type: 'input_audio_buffer.append',
  audio: base64PcmChunk,
}));

Configuration

RealtimeModule.forRoot(options)

OptionTypeDescription
openaiApiKeystringOpenAI API key (or use OPENAI_API_KEY env)
pathstringWebSocket path (default: /realtime)
defaultSessionConfigRealtimeSessionConfigDefault session config
defaultProvider'openai' | 'gemini'Provider (OpenAI supported first)

RealtimeSessionConfig

OptionTypeDescription
instructionsstringSystem prompt for the model
voiceOpenAIVoicealloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar
outputModalities('audio' | 'text')[]Output modes (default: ['audio', 'text'])
inputFormatRealtimeAudioFormatPCM format (default: 24kHz)
turnDetectionbooleanEnable VAD (default: true)

Client Events

Send any OpenAI Realtime client event over the WebSocket:

EventDescription
session.updateUpdate session config
input_audio_buffer.appendSend base64 PCM audio
input_audio_buffer.commitCommit buffer (when VAD disabled)
input_audio_buffer.clearClear buffer
conversation.item.createAdd text message
response.createTrigger model response

Server Events

You receive { event: 'realtime', data: <OpenAI server event> }:

EventDescription
session.created / session.updatedSession lifecycle
response.output_audio.deltaAudio chunk (base64)
response.output_audio.doneAudio complete
response.output_text.delta / response.output_text.doneText stream
response.doneResponse complete
input_audio_buffer.speech_started / speech_stoppedVAD events

Audio Format

  • Input: PCM 16-bit, 24kHz (or 8kHz for telephony)
  • Output: PCM 16-bit, 24kHz

Encode/decode base64 for transport over WebSocket.


Use Cases

  • Voice Assistants — Hands-free, low-latency voice interfaces
  • Call Centers — Real-time AI agents with natural speech
  • Accessibility — Voice-first interfaces
  • Robotics — Voice control for devices
  • Gaming — In-game voice NPCs

API Reference

RealtimeGateway

class RealtimeGateway extends WebSocketGateway {
  constructor(realtimeService: RealtimeService, options?: RealtimeGatewayOptions);
  attachToServer(server: HttpServer, options?: { path?: string; maxPayload?: number }): WebSocketServer;
}

RealtimeService

class RealtimeService {
  createOpenAISession(client: RealtimeClientAdapter, overrides?: {...}): Promise<OpenAIRealtimeSession>;
  getSession(clientId: string): OpenAIRealtimeSession | undefined;
  removeSession(clientId: string): void;
  getStats(): RealtimeSessionStats[];
}

What's Next?