PDF-to-Audio Package

npm downloads

@hazeljs/pdf-to-audio converts PDF documents into spoken audio using OpenAI TTS. Extract text from any PDF, optionally generate an AI summary, produce speech per chunk in parallel, and merge everything into a single MP3. Use it as a REST module, inject it as a service, or drive it from the CLI.

What Does It Solve?

Reading long PDFs is time-consuming. This package turns any PDF — reports, documentation, research papers, contracts — into an audiobook you can listen to. Key problems it solves:

  • Long documents — Chunking handles PDFs that exceed the TTS token limit automatically
  • Async processing — Large documents are processed in a Redis-backed job queue; callers poll for completion
  • Optional AI summaries — A pre-roll AI-generated summary gives listeners context before the full content
  • Multiple integration modes — REST API for services, programmatic service for direct use, CLI for scripts

Why @hazeljs/pdf-to-audio?

ChallengeManual ApproachWith pdf-to-audio
Long PDFs exceeding TTS limitManual chunking per documentAutomatic configurable chunking
Waiting for conversionBlocking the HTTP requestAsync job queue with polling
AI-generated summariesSeparate LLM call and concatenationBuilt-in includeSummary option
Integration choiceOne approach per use caseModule (REST), Service (DI), CLI
Multiple voicesHard-code one voice6 OpenAI voices per request
Infrastructure overheadRoll your own queueRedis queue + worker included

Architecture

Text extraction and TTS are sequential per chunk; the job queue decouples the upload from the (potentially long) conversion process:

graph TD
  A["PDF Upload"] --> B["Job Queue<br/>(Redis)"]
  B --> C["Worker"]
  C --> D["Extract Text<br/>(pdf-parse)"]
  D --> E["Chunk Text"]
  E --> F["AI Summary<br/>(optional)"]
  E --> G["OpenAI TTS<br/>(per chunk)"]
  F --> G
  G --> H["Audio Chunks"]
  H --> I["Merge Audio<br/>(ffmpeg / concat)"]
  I --> J["MP3 File<br/>(outputDir)"]
  K["GET /download/:jobId"] --> J
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#6366f1,stroke:#818cf8,stroke-width:2px,color:#fff
  style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style D fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style G fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style H fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
  style I fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
  style J fill:#6366f1,stroke:#818cf8,stroke-width:2px,color:#fff
  style K fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff

Key Components

  • PdfToAudioService — Core pipeline: extract → chunk → TTS per chunk → merge. Used by the worker and by direct programmatic calls.
  • Job Queue — Redis-backed async queue. POST /convert submits a job and returns { jobId }. A background worker processes the queue and writes the MP3 to outputDir.
  • REST EndpointsPOST /convert, GET /status/:jobId, GET /download/:jobId
  • CLIhazel pdf-to-audio convert submits via the API; --wait polls until done and downloads

Installation

npm install @hazeljs/pdf-to-audio @hazeljs/core @hazeljs/ai @hazeljs/queue @hazeljs/rag ioredis

Redis is required for the REST module and CLI (async job queue). Start Redis before using these modes. For direct programmatic use only, PdfToAudioService can be used without Redis.

Usage

Module — REST API

Register PdfToAudioModule in your app module and get three endpoints automatically:

import { HazelModule } from '@hazeljs/core';
import { PdfToAudioModule } from '@hazeljs/pdf-to-audio';

@HazelModule({
  imports: [
    PdfToAudioModule.forRoot({
      connection: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379', 10),
        password: process.env.REDIS_PASSWORD,
      },
      outputDir: './data/pdf-to-audio',  // where MP3 files are stored
    }),
  ],
})
export class AppModule {}

Endpoints:

MethodPathDescription
POST/api/pdf-to-audio/convertSubmit PDF; returns { jobId } (202)
GET/api/pdf-to-audio/status/:jobIdReturns job status
GET/api/pdf-to-audio/download/:jobIdDownload MP3 when completed

POST /api/pdf-to-audio/convert accepts multipart/form-data:

FieldTypeDescription
fileFilePDF file (required)
voicestringTTS voice (default: alloy)
modelstringTTS model (default: tts-1)
includeSummarystring"true" / "false" (default: "true")
summaryOnlystring"true" to output summary only (default: "false")

GET /api/pdf-to-audio/status/:jobId returns:

{ "jobId": "...", "status": "pending" | "processing" | "completed" | "failed" }

Service — Programmatic

Use PdfToAudioService directly in DI or standalone — no Redis, no queue:

import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { OpenAIProvider } from '@hazeljs/ai';
import { readFileSync } from 'fs';

const provider = new OpenAIProvider();
const service = new PdfToAudioService(provider);

const pdfBuffer = readFileSync('./annual-report.pdf');

// Convert entire PDF to audio
const audioBuffer = await service.convert(pdfBuffer, {
  voice: 'nova',
  model: 'tts-1-hd',    // higher quality
  includeSummary: true,  // prepend AI summary
  summaryOnly: false,    // read full document
});

// Write to disk
writeFileSync('./annual-report.mp3', audioBuffer);

For dependency injection:

import { Service } from '@hazeljs/core';
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';

@Service()
export class DocumentProcessor {
  constructor(private readonly audioService: PdfToAudioService) {}

  async processUpload(pdfBuffer: Buffer): Promise<Buffer> {
    return this.audioService.convert(pdfBuffer, { voice: 'alloy' });
  }
}

CLI

The CLI communicates with a running API server. Submit a job, poll for completion, and download — all in one command:

# Convert and wait — blocks until done, saves to audio.mp3
hazel pdf-to-audio convert document.pdf \
  --api-url http://localhost:3000 \
  --wait \
  -o audio.mp3

# Submit only — prints the job ID and exits
hazel pdf-to-audio convert document.pdf \
  --api-url http://localhost:3000

# Check status and download when completed
hazel pdf-to-audio status <jobId> \
  --api-url http://localhost:3000 \
  -o audio.mp3

# Summary only, with a specific voice
hazel pdf-to-audio convert quarterly-report.pdf \
  --api-url http://localhost:3000 \
  --summary-only \
  --voice shimmer \
  --wait \
  -o summary.mp3

Complete Example: PDF Upload API

An Express endpoint that accepts a PDF upload and returns the audio synchronously (suitable for small PDFs):

import express from 'express';
import multer from 'multer';
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { OpenAIProvider } from '@hazeljs/ai';

const app = express();
const upload = multer({ storage: multer.memoryStorage() });
const provider = new OpenAIProvider();
const audioService = new PdfToAudioService(provider);

app.post('/convert', upload.single('file'), async (req, res) => {
  if (!req.file) {
    return res.status(400).json({ error: 'No PDF file provided' });
  }

  const { voice = 'alloy', includeSummary = 'true', summaryOnly = 'false' } = req.body;

  try {
    const audioBuffer = await audioService.convert(req.file.buffer, {
      voice: voice as ConvertOptions['voice'],
      includeSummary: includeSummary !== 'false',
      summaryOnly: summaryOnly === 'true',
    });

    res.set({
      'Content-Type': 'audio/mpeg',
      'Content-Disposition': `attachment; filename="${req.file.originalname.replace('.pdf', '.mp3')}"`,
    });
    res.send(audioBuffer);
  } catch (err) {
    console.error('Conversion failed:', err);
    res.status(500).json({ error: 'PDF conversion failed' });
  }
});

app.listen(3000, () => console.log('PDF-to-audio API on :3000'));

For async processing of large PDFs, use the PdfToAudioModule REST API instead.

Environment

VariableDescriptionRequired
OPENAI_API_KEYOpenAI API key for TTSYes
REDIS_HOSTRedis host for job queueFor module/CLI only
REDIS_PORTRedis port (default: 6379)For module/CLI only

Options Reference

OptionDescriptionDefault
voiceTTS voice: alloy, echo, fable, onyx, nova, shimmeralloy
modelTTS model: tts-1 (fast), tts-1-hd (higher quality)tts-1
formatOutput format: mp3, opusmp3
includeSummaryPrepend AI-generated summary at the start of the audiotrue
summaryOnlyOutput only the AI summary — skip reading the full documentfalse

Voice Options

VoiceCharacter
alloyNeutral, versatile
echoWarm male voice
fableBritish accent
onyxDeep, authoritative
novaFriendly female voice
shimmerExpressive female voice

Job Lifecycle

POST /convert  →  { jobId: "abc123" }
                         ↓
GET /status/abc123  →  { status: "pending" }
                         ↓
GET /status/abc123  →  { status: "processing" }
                         ↓
GET /status/abc123  →  { status: "completed" }
                         ↓
GET /download/abc123  →  <MP3 file>

Files are stored in outputDir (default: ./data/pdf-to-audio) until the server is restarted. Implement your own cleanup logic or store to S3 if long-term persistence is needed.

Related

  • AI Package — OpenAI and other LLM provider integrations
  • Queue Package — The job queue system powering async processing
  • CLI Package — The hazel pdf-to-audio CLI commands

For full API reference, see the PDF-to-Audio package on GitHub.