HazelJS PDF-to-Audio Package

@hazeljs/pdf-to-audio converts PDF documents into spoken audio using OpenAI TTS — extract text, optionally summarize with AI, produce speech per chunk in parallel, and merge into a single MP3.

Quick Reference

Purpose: @hazeljs/pdf-to-audio converts PDF documents into spoken audio (MP3) using OpenAI TTS, with optional AI summarization, parallel chunk processing, and audio merging.
When to use: Use @hazeljs/pdf-to-audio to turn PDF reports, documentation, or research papers into audiobooks. Available as REST module, injectable service, or CLI.
Key concepts: PDF text extraction, AI summarization, OpenAI TTS, chunk-based speech generation, parallel processing, audio merging (MP3).
Dependencies: @hazeljs/core, @hazeljs/ai, OpenAI API key.
Common patterns: Upload PDF → extract text → optionally summarize → generate speech per chunk in parallel → merge into MP3 → return audio file.
Common mistakes: Not handling large PDFs (chunk size limits); not setting appropriate voice/speed for TTS; using synchronous processing for large documents (use background jobs).

What Does It Solve?

Reading long PDFs is time-consuming. This package turns any PDF — reports, documentation, research papers, contracts — into an audiobook you can listen to. Key problems it solves:

Long documents — Chunking handles PDFs that exceed the TTS token limit automatically
Async processing — Large documents are processed in a Redis-backed job queue; callers poll for completion
Optional AI summaries — A pre-roll AI-generated summary gives listeners context before the full content
Multiple integration modes — REST API for services, programmatic service for direct use, CLI for scripts

Why @hazeljs/pdf-to-audio?

Challenge	Manual Approach	With pdf-to-audio
Long PDFs exceeding TTS limit	Manual chunking per document	Automatic configurable chunking
Waiting for conversion	Blocking the HTTP request	Async job queue with polling
AI-generated summaries	Separate LLM call and concatenation	Built-in `includeSummary` option
Integration choice	One approach per use case	Module (REST), Service (DI), CLI
Multiple voices	Hard-code one voice	6 OpenAI voices per request
Infrastructure overhead	Roll your own queue	Redis queue + worker included

Architecture

Text extraction and TTS are sequential per chunk; the job queue decouples the upload from the (potentially long) conversion process:

graph TD
  A["PDF Upload"] --> B["Job Queue<br/>(Redis)"]
  B --> C["Worker"]
  C --> D["Extract Text<br/>(pdf-parse)"]
  D --> E["Chunk Text"]
  E --> F["AI Summary<br/>(optional)"]
  E --> G["OpenAI TTS<br/>(per chunk)"]
  F --> G
  G --> H["Audio Chunks"]
  H --> I["Merge Audio<br/>(ffmpeg / concat)"]
  I --> J["MP3 File<br/>(outputDir)"]
  K["GET /download/:jobId"] --> J
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#6366f1,stroke:#818cf8,stroke-width:2px,color:#fff
  style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style D fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style G fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style H fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
  style I fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
  style J fill:#6366f1,stroke:#818cf8,stroke-width:2px,color:#fff
  style K fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff

Key Components

PdfToAudioService — Core pipeline: extract → chunk → TTS per chunk → merge. Used by the worker and by direct programmatic calls.
Job Queue — Redis-backed async queue. POST /convert submits a job and returns { jobId }. A background worker processes the queue and writes the MP3 to outputDir.
REST Endpoints — POST /convert, GET /status/:jobId, GET /download/:jobId
CLI — hazel pdf-to-audio convert submits via the API; --wait polls until done and downloads

Installation

npm install @hazeljs/pdf-to-audio @hazeljs/core @hazeljs/ai @hazeljs/queue @hazeljs/rag ioredis

Redis is required for the REST module and CLI (async job queue). Start Redis before using these modes. For direct programmatic use only, PdfToAudioService can be used without Redis.

Usage

Module — REST API

import { HazelModule } from '@hazeljs/core';
import { PdfToAudioModule } from '@hazeljs/pdf-to-audio';

@HazelModule({
  imports: [
    PdfToAudioModule.forRoot({
      connection: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379', 10),
        password: process.env.REDIS_PASSWORD,
      },
      outputDir: './data/pdf-to-audio',  // where MP3 files are stored
    }),
  ],
})
export class AppModule {}

Endpoints:

Method	Path	Description
`POST`	`/api/pdf-to-audio/convert`	Submit PDF; returns `{ jobId }` (202)
`GET`	`/api/pdf-to-audio/status/:jobId`	Returns job status
`GET`	`/api/pdf-to-audio/download/:jobId`	Download MP3 when completed

POST /api/pdf-to-audio/convert accepts multipart/form-data:

Field	Type	Description
`file`	File	PDF file (required)
`voice`	string	TTS voice (default: `alloy`)
`model`	string	TTS model (default: `tts-1`)
`includeSummary`	string	`"true"` / `"false"` (default: `"true"`)
`summaryOnly`	string	`"true"` to output summary only (default: `"false"`)

GET /api/pdf-to-audio/status/:jobId returns:

{ "jobId": "...", "status": "pending" | "processing" | "completed" | "failed" }

Service — Programmatic

Use PdfToAudioService directly in DI or standalone — no Redis, no queue:

import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { OpenAIProvider } from '@hazeljs/ai';
import { readFileSync } from 'fs';

const provider = new OpenAIProvider();
const service = new PdfToAudioService(provider);

const pdfBuffer = readFileSync('./annual-report.pdf');

// Convert entire PDF to audio
const audioBuffer = await service.convert(pdfBuffer, {
  voice: 'nova',
  model: 'tts-1-hd',    // higher quality
  includeSummary: true,  // prepend AI summary
  summaryOnly: false,    // read full document
});

// Write to disk
writeFileSync('./annual-report.mp3', audioBuffer);

For dependency injection:

import { Service } from '@hazeljs/core';
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';

@Service()
export class DocumentProcessor {
  constructor(private readonly audioService: PdfToAudioService) {}

  async processUpload(pdfBuffer: Buffer): Promise<Buffer> {
    return this.audioService.convert(pdfBuffer, { voice: 'alloy' });
  }
}

CLI

The CLI communicates with a running API server. Submit a job, poll for completion, and download — all in one command:

# Convert and wait — blocks until done, saves to audio.mp3
hazel pdf-to-audio convert document.pdf \
  --api-url http://localhost:3000 \
  --wait \
  -o audio.mp3

# Submit only — prints the job ID and exits
hazel pdf-to-audio convert document.pdf \
  --api-url http://localhost:3000

# Check status and download when completed
hazel pdf-to-audio status <jobId> \
  --api-url http://localhost:3000 \
  -o audio.mp3

# Summary only, with a specific voice
hazel pdf-to-audio convert quarterly-report.pdf \
  --api-url http://localhost:3000 \
  --summary-only \
  --voice shimmer \
  --wait \
  -o summary.mp3

Complete Example: PDF Upload API

An Express endpoint that accepts a PDF upload and returns the audio synchronously (suitable for small PDFs):

import express from 'express';
import multer from 'multer';
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { OpenAIProvider } from '@hazeljs/ai';

const app = express();
const upload = multer({ storage: multer.memoryStorage() });
const provider = new OpenAIProvider();
const audioService = new PdfToAudioService(provider);

app.post('/convert', upload.single('file'), async (req, res) => {
  if (!req.file) {
    return res.status(400).json({ error: 'No PDF file provided' });
  }

  const { voice = 'alloy', includeSummary = 'true', summaryOnly = 'false' } = req.body;

  try {
    const audioBuffer = await audioService.convert(req.file.buffer, {
      voice: voice as ConvertOptions['voice'],
      includeSummary: includeSummary !== 'false',
      summaryOnly: summaryOnly === 'true',
    });

    res.set({
      'Content-Type': 'audio/mpeg',
      'Content-Disposition': `attachment; filename="${req.file.originalname.replace('.pdf', '.mp3')}"`,
    });
    res.send(audioBuffer);
  } catch (err) {
    console.error('Conversion failed:', err);
    res.status(500).json({ error: 'PDF conversion failed' });
  }
});

app.listen(3000, () => console.log('PDF-to-audio API on :3000'));

For async processing of large PDFs, use the PdfToAudioModule REST API instead.

Environment

Variable	Description	Required
`OPENAI_API_KEY`	OpenAI API key for TTS	Yes
`REDIS_HOST`	Redis host for job queue	For module/CLI only
`REDIS_PORT`	Redis port (default: `6379`)	For module/CLI only

Options Reference

Option	Description	Default
`voice`	TTS voice: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`	`alloy`
`model`	TTS model: `tts-1` (fast), `tts-1-hd` (higher quality)	`tts-1`
`format`	Output format: `mp3`, `opus`	`mp3`
`includeSummary`	Prepend AI-generated summary at the start of the audio	`true`
`summaryOnly`	Output only the AI summary — skip reading the full document	`false`

Voice Options

Voice	Character
`alloy`	Neutral, versatile
`echo`	Warm male voice
`fable`	British accent
`onyx`	Deep, authoritative
`nova`	Friendly female voice
`shimmer`	Expressive female voice

Job Lifecycle

POST /convert  →  { jobId: "abc123" }
                         ↓
GET /status/abc123  →  { status: "pending" }
                         ↓
GET /status/abc123  →  { status: "processing" }
                         ↓
GET /status/abc123  →  { status: "completed" }
                         ↓
GET /download/abc123  →  <MP3 file>

Files are stored in outputDir (default: ./data/pdf-to-audio) until the server is restarted. Implement your own cleanup logic or store to S3 if long-term persistence is needed.

AI Package — OpenAI and other LLM provider integrations
Queue Package — The job queue system powering async processing
CLI Package — The hazel pdf-to-audio CLI commands

AI Package – OpenAI TTS provider
Queue Package – Background job processing
CLI Package – Command-line interface

Recipes

Recipe: Convert a PDF to Audio via REST

// File: src/audio/audio.controller.ts
import { Controller, Post, UploadedFile, UseInterceptors, Res } from '@hazeljs/core';
import { FileInterceptor } from '@hazeljs/core';
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { Response } from 'express';

@Controller('audio')
export class AudioController {
  constructor(private readonly pdfToAudio: PdfToAudioService) {}

  @Post('convert')
  @UseInterceptors(FileInterceptor('file'))
  async convert(@UploadedFile() file: Express.Multer.File, @Res() res: Response) {
    const result = await this.pdfToAudio.convert({
      buffer: file.buffer,
      voice: 'nova',
      speed: 1.0,
      summarize: true,
    });

    res.setHeader('Content-Type', 'audio/mpeg');
    res.setHeader('Content-Disposition', 'attachment; filename="output.mp3"');
    res.send(result.audio);
  }
}

Recipe: Background PDF Conversion with Queue

// File: src/audio/audio-job.service.ts
import { Service } from '@hazeljs/core';
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { QueueService } from '@hazeljs/queue';

@Service()
export class AudioJobService {
  constructor(
    private readonly pdfToAudio: PdfToAudioService,
    private readonly queue: QueueService,
  ) {}

  async enqueueConversion(pdfBuffer: Buffer, userId: string) {
    return this.queue.add('pdf-to-audio', {
      buffer: pdfBuffer.toString('base64'),
      userId,
      voice: 'alloy',
    });
  }
}

For full API reference, see the PDF-to-Audio package on GitHub.