HazelJS Realtime Package

@hazeljs/realtime provides low-latency voice AI with the OpenAI Realtime API — speech-to-speech conversations with sub-second latency via WebSocket.

Quick Reference

Purpose: @hazeljs/realtime provides voice AI with the OpenAI Realtime API for speech-to-speech conversations — no separate STT → LLM → TTS pipeline needed.
When to use: Use @hazeljs/realtime for voice-first AI applications requiring sub-second latency. Use @hazeljs/ai for text-based LLM interactions instead.
Key concepts: OpenAI Realtime API, WebSocket connection, speech-to-speech, sub-second latency, voice AI.
Dependencies: @hazeljs/core, OpenAI API key with Realtime access.
Common patterns: Connect via WebSocket → stream audio in → receive audio out in real-time → handle turn-taking and interruptions.
Common mistakes: Not handling WebSocket reconnection; not implementing turn-taking logic; using realtime API for text-only use cases (use @hazeljs/ai instead).

Purpose

Building voice AI applications typically requires stitching together speech-to-text, an LLM, and text-to-speech — each adding latency and complexity. The @hazeljs/realtime package simplifies this by providing:

Speech-to-Speech — Native voice in, voice out — no intermediate text step
Low Latency — Sub-second response via WebSocket to OpenAI Realtime API
WebSocket Integration — Built on @hazeljs/websocket with @Realtime decorator
Configurable — Instructions, voice, output modalities per session
Bidirectional — Proxy client ↔ OpenAI; send audio, receive audio + text
Event-Driven — Forward any OpenAI Realtime client/server events

Architecture

graph TD
  A["Client<br/>(Browser, Mobile)"] -->|WebSocket| B["RealtimeGateway"]
  B --> C["RealtimeService"]
  C --> D["OpenAIRealtimeSession"]
  D -->|wss://api.openai.com/v1/realtime| E["OpenAI Realtime API"]
  E -->|Audio + Text| D
  D --> C
  C --> B
  B --> A
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style C fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style D fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style E fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

Key Components

RealtimeGateway — WebSocket gateway that proxies client connections to OpenAI
RealtimeService — Manages sessions, creates OpenAI connections per client
OpenAIRealtimeSession — Per-client session that forwards events bidirectionally
Auto-attach — Gateway attaches to HTTP server automatically via OnApplicationBootstrap

Advantages

1. Single Pipeline

No separate STT, LLM, and TTS services — one WebSocket connection handles everything.

2. Sub-Second Latency

Direct streaming to OpenAI Realtime API eliminates round-trip delays between pipeline stages.

3. Zero Boilerplate

4. Flexible Output

Receive both audio and text in the same stream — use what you need for your UI.

5. Production Ready

Built on HazelJS WebSocket infrastructure with proper connection lifecycle, error handling, and session management.

Installation

npm install @hazeljs/realtime @hazeljs/core @hazeljs/websocket

Environment

Set OPENAI_API_KEY or pass openaiApiKey in RealtimeModule.forRoot().

Quick Start

1. Register Realtime Module

// app.module.ts
import { HazelModule } from '@hazeljs/core';
import { RealtimeModule } from '@hazeljs/realtime';

@HazelModule({
  imports: [
    RealtimeModule.forRoot({
      openaiApiKey: process.env.OPENAI_API_KEY,
      path: '/realtime',
      defaultSessionConfig: {
        instructions: 'You are a helpful voice assistant. Speak clearly and briefly.',
        voice: 'marin',
        outputModalities: ['audio', 'text'],
      },
    }),
  ],
})
export class AppModule {}

2. Bootstrap

// main.ts
import { HazelApp } from '@hazeljs/core';
import { AppModule } from './app.module';

async function bootstrap() {
  const app = new HazelApp(AppModule);
  const port = parseInt(process.env.PORT ?? '3000', 10);

  await app.listen(port);

  console.log(`Realtime voice AI at ws://localhost:${port}/realtime`);
}

bootstrap().catch(console.error);

The RealtimeGateway is automatically attached to the HTTP server when the app starts listening (via OnApplicationBootstrap).

3. Connect from Client

const ws = new WebSocket('ws://localhost:3000/realtime');

ws.onopen = () => {
  // Optional: update session config
  ws.send(JSON.stringify({
    type: 'session.update',
    session: { instructions: 'Be extra friendly!' },
  }));
};

ws.onmessage = (e) => {
  const { event, data } = JSON.parse(e.data);
  if (event === 'realtime') {
    if (data.type === 'response.output_audio.delta') {
      // Play base64 PCM: data.delta
    }
  }
};

// Send audio (base64 PCM 24kHz)
ws.send(JSON.stringify({
  type: 'input_audio_buffer.append',
  audio: base64PcmChunk,
}));

Configuration

RealtimeModule.forRoot(options)

Option	Type	Description
`openaiApiKey`	string	OpenAI API key (or use `OPENAI_API_KEY` env)
`path`	string	WebSocket path (default: `/realtime`)
`defaultSessionConfig`	RealtimeSessionConfig	Default session config
`defaultProvider`	'openai' \| 'gemini'	Provider (OpenAI supported first)

RealtimeSessionConfig

Option	Type	Description
`instructions`	string	System prompt for the model
`voice`	OpenAIVoice	alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar
`outputModalities`	('audio' \| 'text')[]	Output modes (default: ['audio', 'text'])
`inputFormat`	RealtimeAudioFormat	PCM format (default: 24kHz)
`turnDetection`	boolean	Enable VAD (default: true)

Client Events

Send any OpenAI Realtime client event over the WebSocket:

Event	Description
`session.update`	Update session config
`input_audio_buffer.append`	Send base64 PCM audio
`input_audio_buffer.commit`	Commit buffer (when VAD disabled)
`input_audio_buffer.clear`	Clear buffer
`conversation.item.create`	Add text message
`response.create`	Trigger model response

Server Events

You receive { event: 'realtime', data: <OpenAI server event> }:

Event	Description
`session.created` / `session.updated`	Session lifecycle
`response.output_audio.delta`	Audio chunk (base64)
`response.output_audio.done`	Audio complete
`response.output_text.delta` / `response.output_text.done`	Text stream
`response.done`	Response complete
`input_audio_buffer.speech_started` / `speech_stopped`	VAD events

Audio Format

Input: PCM 16-bit, 24kHz (or 8kHz for telephony)
Output: PCM 16-bit, 24kHz

Encode/decode base64 for transport over WebSocket.

Use Cases

Voice Assistants — Hands-free, low-latency voice interfaces
Call Centers — Real-time AI agents with natural speech
Accessibility — Voice-first interfaces
Robotics — Voice control for devices
Gaming — In-game voice NPCs

API Reference

RealtimeGateway

class RealtimeGateway extends WebSocketGateway {
  constructor(realtimeService: RealtimeService, options?: RealtimeGatewayOptions);
  attachToServer(server: HttpServer, options?: { path?: string; maxPayload?: number }): WebSocketServer;
}

RealtimeService

class RealtimeService {
  createOpenAISession(client: RealtimeClientAdapter, overrides?: {...}): Promise<OpenAIRealtimeSession>;
  getSession(clientId: string): OpenAIRealtimeSession | undefined;
  removeSession(clientId: string): void;
  getStats(): RealtimeSessionStats[];
}

What's Next?

Explore WebSocket Package for the underlying gateway infrastructure
Check out hazeljs-realtime-voice-starter for a complete example
Read the OpenAI Realtime API documentation

Recipes

Recipe: Voice AI WebSocket Gateway

// File: src/voice/voice.gateway.ts
import { WebSocketGateway, SubscribeMessage, MessageBody, ConnectedSocket } from '@hazeljs/websocket';
import { RealtimeService } from '@hazeljs/realtime';
import { Socket } from 'socket.io';

@WebSocketGateway({ namespace: 'voice' })
export class VoiceGateway {
  constructor(private readonly realtime: RealtimeService) {}

  @SubscribeMessage('start-session')
  async startSession(@ConnectedSocket() client: Socket) {
    const session = await this.realtime.createSession({
      model: 'gpt-4o-realtime-preview',
      voice: 'alloy',
      instructions: 'You are a helpful voice assistant. Keep responses brief.',
    });

    session.on('audio', (audioChunk: Buffer) => {
      client.emit('audio-response', audioChunk);
    });

    client.data.session = session;
    return { status: 'session-started' };
  }

  @SubscribeMessage('audio-input')
  async handleAudio(@MessageBody() audio: Buffer, @ConnectedSocket() client: Socket) {
    const session = client.data.session;
    if (session) {
      await session.sendAudio(audio);
    }
  }
}

Recipe: Text-to-Speech Endpoint

// File: src/tts/tts.controller.ts
import { Controller, Post, Body, Res } from '@hazeljs/core';
import { RealtimeService } from '@hazeljs/realtime';
import { Response } from 'express';

@Controller('tts')
export class TTSController {
  constructor(private readonly realtime: RealtimeService) {}

  @Post()
  async speak(@Body('text') text: string, @Res() res: Response) {
    const session = await this.realtime.createSession({
      model: 'gpt-4o-realtime-preview',
      voice: 'nova',
      modalities: ['audio'],
    });

    res.setHeader('Content-Type', 'audio/wav');
    const audioStream = await session.textToSpeech(text);
    audioStream.pipe(res);
  }
}

WebSocket Package – Underlying gateway infrastructure
AI Package – OpenAI provider