Realtime Package
The @hazeljs/realtime package provides low-latency voice AI with the OpenAI Realtime API. Connect via WebSocket for speech-to-speech conversations with sub-second latency — no separate STT → LLM → TTS pipeline.
Purpose
Building voice AI applications typically requires stitching together speech-to-text, an LLM, and text-to-speech — each adding latency and complexity. The @hazeljs/realtime package simplifies this by providing:
- Speech-to-Speech — Native voice in, voice out — no intermediate text step
- Low Latency — Sub-second response via WebSocket to OpenAI Realtime API
- WebSocket Integration — Built on @hazeljs/websocket with @Realtime decorator
- Configurable — Instructions, voice, output modalities per session
- Bidirectional — Proxy client ↔ OpenAI; send audio, receive audio + text
- Event-Driven — Forward any OpenAI Realtime client/server events
Architecture
graph TD A["Client<br/>(Browser, Mobile)"] -->|WebSocket| B["RealtimeGateway"] B --> C["RealtimeService"] C --> D["OpenAIRealtimeSession"] D -->|wss://api.openai.com/v1/realtime| E["OpenAI Realtime API"] E -->|Audio + Text| D D --> C C --> B B --> A style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff style C fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style D fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style E fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
Key Components
- RealtimeGateway — WebSocket gateway that proxies client connections to OpenAI
- RealtimeService — Manages sessions, creates OpenAI connections per client
- OpenAIRealtimeSession — Per-client session that forwards events bidirectionally
- Auto-attach — Gateway attaches to HTTP server automatically via
OnApplicationBootstrap
Advantages
1. Single Pipeline
No separate STT, LLM, and TTS services — one WebSocket connection handles everything.
2. Sub-Second Latency
Direct streaming to OpenAI Realtime API eliminates round-trip delays between pipeline stages.
3. Zero Boilerplate
Register RealtimeModule.forRoot() and the gateway attaches automatically when the app starts.
4. Flexible Output
Receive both audio and text in the same stream — use what you need for your UI.
5. Production Ready
Built on HazelJS WebSocket infrastructure with proper connection lifecycle, error handling, and session management.
Installation
npm install @hazeljs/realtime @hazeljs/core @hazeljs/websocket
Environment
Set OPENAI_API_KEY or pass openaiApiKey in RealtimeModule.forRoot().
Quick Start
1. Register Realtime Module
// app.module.ts
import { HazelModule } from '@hazeljs/core';
import { RealtimeModule } from '@hazeljs/realtime';
@HazelModule({
imports: [
RealtimeModule.forRoot({
openaiApiKey: process.env.OPENAI_API_KEY,
path: '/realtime',
defaultSessionConfig: {
instructions: 'You are a helpful voice assistant. Speak clearly and briefly.',
voice: 'marin',
outputModalities: ['audio', 'text'],
},
}),
],
})
export class AppModule {}
2. Bootstrap
// main.ts
import { HazelApp } from '@hazeljs/core';
import { AppModule } from './app.module';
async function bootstrap() {
const app = new HazelApp(AppModule);
const port = parseInt(process.env.PORT ?? '3000', 10);
await app.listen(port);
console.log(`Realtime voice AI at ws://localhost:${port}/realtime`);
}
bootstrap().catch(console.error);
The RealtimeGateway is automatically attached to the HTTP server when the app starts listening (via OnApplicationBootstrap).
3. Connect from Client
const ws = new WebSocket('ws://localhost:3000/realtime');
ws.onopen = () => {
// Optional: update session config
ws.send(JSON.stringify({
type: 'session.update',
session: { instructions: 'Be extra friendly!' },
}));
};
ws.onmessage = (e) => {
const { event, data } = JSON.parse(e.data);
if (event === 'realtime') {
if (data.type === 'response.output_audio.delta') {
// Play base64 PCM: data.delta
}
}
};
// Send audio (base64 PCM 24kHz)
ws.send(JSON.stringify({
type: 'input_audio_buffer.append',
audio: base64PcmChunk,
}));
Configuration
RealtimeModule.forRoot(options)
| Option | Type | Description |
|---|---|---|
openaiApiKey | string | OpenAI API key (or use OPENAI_API_KEY env) |
path | string | WebSocket path (default: /realtime) |
defaultSessionConfig | RealtimeSessionConfig | Default session config |
defaultProvider | 'openai' | 'gemini' | Provider (OpenAI supported first) |
RealtimeSessionConfig
| Option | Type | Description |
|---|---|---|
instructions | string | System prompt for the model |
voice | OpenAIVoice | alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar |
outputModalities | ('audio' | 'text')[] | Output modes (default: ['audio', 'text']) |
inputFormat | RealtimeAudioFormat | PCM format (default: 24kHz) |
turnDetection | boolean | Enable VAD (default: true) |
Client Events
Send any OpenAI Realtime client event over the WebSocket:
| Event | Description |
|---|---|
session.update | Update session config |
input_audio_buffer.append | Send base64 PCM audio |
input_audio_buffer.commit | Commit buffer (when VAD disabled) |
input_audio_buffer.clear | Clear buffer |
conversation.item.create | Add text message |
response.create | Trigger model response |
Server Events
You receive { event: 'realtime', data: <OpenAI server event> }:
| Event | Description |
|---|---|
session.created / session.updated | Session lifecycle |
response.output_audio.delta | Audio chunk (base64) |
response.output_audio.done | Audio complete |
response.output_text.delta / response.output_text.done | Text stream |
response.done | Response complete |
input_audio_buffer.speech_started / speech_stopped | VAD events |
Audio Format
- Input: PCM 16-bit, 24kHz (or 8kHz for telephony)
- Output: PCM 16-bit, 24kHz
Encode/decode base64 for transport over WebSocket.
Use Cases
- Voice Assistants — Hands-free, low-latency voice interfaces
- Call Centers — Real-time AI agents with natural speech
- Accessibility — Voice-first interfaces
- Robotics — Voice control for devices
- Gaming — In-game voice NPCs
API Reference
RealtimeGateway
class RealtimeGateway extends WebSocketGateway {
constructor(realtimeService: RealtimeService, options?: RealtimeGatewayOptions);
attachToServer(server: HttpServer, options?: { path?: string; maxPayload?: number }): WebSocketServer;
}
RealtimeService
class RealtimeService {
createOpenAISession(client: RealtimeClientAdapter, overrides?: {...}): Promise<OpenAIRealtimeSession>;
getSession(clientId: string): OpenAIRealtimeSession | undefined;
removeSession(clientId: string): void;
getStats(): RealtimeSessionStats[];
}
What's Next?
- Explore WebSocket Package for the underlying gateway infrastructure
- Check out hazeljs-realtime-voice-starter for a complete example
- Read the OpenAI Realtime API documentation