PDF-to-Audio Package
@hazeljs/pdf-to-audio converts PDF documents into spoken audio using OpenAI TTS. Extract text from any PDF, optionally generate an AI summary, produce speech per chunk in parallel, and merge everything into a single MP3. Use it as a REST module, inject it as a service, or drive it from the CLI.
What Does It Solve?
Reading long PDFs is time-consuming. This package turns any PDF — reports, documentation, research papers, contracts — into an audiobook you can listen to. Key problems it solves:
- Long documents — Chunking handles PDFs that exceed the TTS token limit automatically
- Async processing — Large documents are processed in a Redis-backed job queue; callers poll for completion
- Optional AI summaries — A pre-roll AI-generated summary gives listeners context before the full content
- Multiple integration modes — REST API for services, programmatic service for direct use, CLI for scripts
Why @hazeljs/pdf-to-audio?
| Challenge | Manual Approach | With pdf-to-audio |
|---|---|---|
| Long PDFs exceeding TTS limit | Manual chunking per document | Automatic configurable chunking |
| Waiting for conversion | Blocking the HTTP request | Async job queue with polling |
| AI-generated summaries | Separate LLM call and concatenation | Built-in includeSummary option |
| Integration choice | One approach per use case | Module (REST), Service (DI), CLI |
| Multiple voices | Hard-code one voice | 6 OpenAI voices per request |
| Infrastructure overhead | Roll your own queue | Redis queue + worker included |
Architecture
Text extraction and TTS are sequential per chunk; the job queue decouples the upload from the (potentially long) conversion process:
graph TD A["PDF Upload"] --> B["Job Queue<br/>(Redis)"] B --> C["Worker"] C --> D["Extract Text<br/>(pdf-parse)"] D --> E["Chunk Text"] E --> F["AI Summary<br/>(optional)"] E --> G["OpenAI TTS<br/>(per chunk)"] F --> G G --> H["Audio Chunks"] H --> I["Merge Audio<br/>(ffmpeg / concat)"] I --> J["MP3 File<br/>(outputDir)"] K["GET /download/:jobId"] --> J style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#6366f1,stroke:#818cf8,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style D fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff style G fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style H fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff style I fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff style J fill:#6366f1,stroke:#818cf8,stroke-width:2px,color:#fff style K fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
Key Components
PdfToAudioService— Core pipeline: extract → chunk → TTS per chunk → merge. Used by the worker and by direct programmatic calls.- Job Queue — Redis-backed async queue.
POST /convertsubmits a job and returns{ jobId }. A background worker processes the queue and writes the MP3 tooutputDir. - REST Endpoints —
POST /convert,GET /status/:jobId,GET /download/:jobId - CLI —
hazel pdf-to-audio convertsubmits via the API;--waitpolls until done and downloads
Installation
npm install @hazeljs/pdf-to-audio @hazeljs/core @hazeljs/ai @hazeljs/queue @hazeljs/rag ioredis
Redis is required for the REST module and CLI (async job queue). Start Redis before using these modes. For direct programmatic use only, PdfToAudioService can be used without Redis.
Usage
Module — REST API
Register PdfToAudioModule in your app module and get three endpoints automatically:
import { HazelModule } from '@hazeljs/core';
import { PdfToAudioModule } from '@hazeljs/pdf-to-audio';
@HazelModule({
imports: [
PdfToAudioModule.forRoot({
connection: {
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT || '6379', 10),
password: process.env.REDIS_PASSWORD,
},
outputDir: './data/pdf-to-audio', // where MP3 files are stored
}),
],
})
export class AppModule {}
Endpoints:
| Method | Path | Description |
|---|---|---|
POST | /api/pdf-to-audio/convert | Submit PDF; returns { jobId } (202) |
GET | /api/pdf-to-audio/status/:jobId | Returns job status |
GET | /api/pdf-to-audio/download/:jobId | Download MP3 when completed |
POST /api/pdf-to-audio/convert accepts multipart/form-data:
| Field | Type | Description |
|---|---|---|
file | File | PDF file (required) |
voice | string | TTS voice (default: alloy) |
model | string | TTS model (default: tts-1) |
includeSummary | string | "true" / "false" (default: "true") |
summaryOnly | string | "true" to output summary only (default: "false") |
GET /api/pdf-to-audio/status/:jobId returns:
{ "jobId": "...", "status": "pending" | "processing" | "completed" | "failed" }
Service — Programmatic
Use PdfToAudioService directly in DI or standalone — no Redis, no queue:
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { OpenAIProvider } from '@hazeljs/ai';
import { readFileSync } from 'fs';
const provider = new OpenAIProvider();
const service = new PdfToAudioService(provider);
const pdfBuffer = readFileSync('./annual-report.pdf');
// Convert entire PDF to audio
const audioBuffer = await service.convert(pdfBuffer, {
voice: 'nova',
model: 'tts-1-hd', // higher quality
includeSummary: true, // prepend AI summary
summaryOnly: false, // read full document
});
// Write to disk
writeFileSync('./annual-report.mp3', audioBuffer);
For dependency injection:
import { Service } from '@hazeljs/core';
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
@Service()
export class DocumentProcessor {
constructor(private readonly audioService: PdfToAudioService) {}
async processUpload(pdfBuffer: Buffer): Promise<Buffer> {
return this.audioService.convert(pdfBuffer, { voice: 'alloy' });
}
}
CLI
The CLI communicates with a running API server. Submit a job, poll for completion, and download — all in one command:
# Convert and wait — blocks until done, saves to audio.mp3
hazel pdf-to-audio convert document.pdf \
--api-url http://localhost:3000 \
--wait \
-o audio.mp3
# Submit only — prints the job ID and exits
hazel pdf-to-audio convert document.pdf \
--api-url http://localhost:3000
# Check status and download when completed
hazel pdf-to-audio status <jobId> \
--api-url http://localhost:3000 \
-o audio.mp3
# Summary only, with a specific voice
hazel pdf-to-audio convert quarterly-report.pdf \
--api-url http://localhost:3000 \
--summary-only \
--voice shimmer \
--wait \
-o summary.mp3
Complete Example: PDF Upload API
An Express endpoint that accepts a PDF upload and returns the audio synchronously (suitable for small PDFs):
import express from 'express';
import multer from 'multer';
import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { OpenAIProvider } from '@hazeljs/ai';
const app = express();
const upload = multer({ storage: multer.memoryStorage() });
const provider = new OpenAIProvider();
const audioService = new PdfToAudioService(provider);
app.post('/convert', upload.single('file'), async (req, res) => {
if (!req.file) {
return res.status(400).json({ error: 'No PDF file provided' });
}
const { voice = 'alloy', includeSummary = 'true', summaryOnly = 'false' } = req.body;
try {
const audioBuffer = await audioService.convert(req.file.buffer, {
voice: voice as ConvertOptions['voice'],
includeSummary: includeSummary !== 'false',
summaryOnly: summaryOnly === 'true',
});
res.set({
'Content-Type': 'audio/mpeg',
'Content-Disposition': `attachment; filename="${req.file.originalname.replace('.pdf', '.mp3')}"`,
});
res.send(audioBuffer);
} catch (err) {
console.error('Conversion failed:', err);
res.status(500).json({ error: 'PDF conversion failed' });
}
});
app.listen(3000, () => console.log('PDF-to-audio API on :3000'));
For async processing of large PDFs, use the PdfToAudioModule REST API instead.
Environment
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY | OpenAI API key for TTS | Yes |
REDIS_HOST | Redis host for job queue | For module/CLI only |
REDIS_PORT | Redis port (default: 6379) | For module/CLI only |
Options Reference
| Option | Description | Default |
|---|---|---|
voice | TTS voice: alloy, echo, fable, onyx, nova, shimmer | alloy |
model | TTS model: tts-1 (fast), tts-1-hd (higher quality) | tts-1 |
format | Output format: mp3, opus | mp3 |
includeSummary | Prepend AI-generated summary at the start of the audio | true |
summaryOnly | Output only the AI summary — skip reading the full document | false |
Voice Options
| Voice | Character |
|---|---|
alloy | Neutral, versatile |
echo | Warm male voice |
fable | British accent |
onyx | Deep, authoritative |
nova | Friendly female voice |
shimmer | Expressive female voice |
Job Lifecycle
POST /convert → { jobId: "abc123" }
↓
GET /status/abc123 → { status: "pending" }
↓
GET /status/abc123 → { status: "processing" }
↓
GET /status/abc123 → { status: "completed" }
↓
GET /download/abc123 → <MP3 file>
Files are stored in outputDir (default: ./data/pdf-to-audio) until the server is restarted. Implement your own cleanup logic or store to S3 if long-term persistence is needed.
Related
- AI Package — OpenAI and other LLM provider integrations
- Queue Package — The job queue system powering async processing
- CLI Package — The
hazel pdf-to-audioCLI commands
For full API reference, see the PDF-to-Audio package on GitHub.