HazelJS RAG Package
@hazeljs/rag provides Retrieval-Augmented Generation for HazelJS applications with document loaders, knowledge graph retrieval (GraphRAG), memory management, vector stores, and semantic search.
Quick Reference
- Purpose:
@hazeljs/ragprovides document loading, chunking, embedding, vector storage, semantic search, hybrid search, GraphRAG, and memory management for building RAG applications in HazelJS. - When to use: Use
@hazeljs/ragwhen a HazelJS application needs to retrieve relevant documents before LLM generation, build a knowledge base, or maintain conversation memory with semantic retrieval. Use@hazeljs/aialone for simple LLM calls without document retrieval. - Key concepts:
RAGPipeline,Reranker, document loaders (11 built-in), chunking strategies,OpenAIEmbeddings/CohereEmbeddings, vector stores (Memory, Pinecone, Qdrant, Weaviate, ChromaDB),@SemanticSearchdecorator,@HybridSearchdecorator, GraphRAG (knowledge graph),MemoryManager,RagModule. - Inputs: Documents (text, PDF, web, etc.), user queries, embedding configuration, vector store configuration.
- Outputs: Retrieved document chunks ranked by relevance, context-grounded LLM responses, knowledge graph entities and relationships.
- Dependencies:
@hazeljs/core,@hazeljs/ai(for embeddings and LLM generation), a vector store provider. - Common patterns: Load documents → chunk → embed → index in vector store → query with
@SemanticSearchorRAGPipeline→ pass retrieved context to LLM → generate response. - Common mistakes: Not chunking documents before indexing (large documents produce poor embeddings); using in-memory vector store in production (not persistent); not adding metadata to documents; setting topK too high (noise) or too low (missing context).
Purpose
Building RAG applications requires integrating vector databases, managing embeddings, loading documents from diverse sources, implementing search strategies, and maintaining conversation context. The @hazeljs/rag package solves all of this in one place:
- 11 Document Loaders: TXT, Markdown, JSON, CSV, HTML, PDF, DOCX, web scraping, YouTube transcripts, GitHub repos, and inline text — all with a unified
BaseDocumentLoaderAPI - GraphRAG: Knowledge graph-based retrieval that extracts entities and relationships, detects communities, and enables entity-centric (local) and thematic (global) search that outperforms flat cosine similarity
- 5 Vector Store Implementations: Memory, Pinecone, Qdrant, Weaviate, and ChromaDB with a unified interface
- Memory System: Conversation tracking, entity memory, fact storage, and working memory for context-aware AI
- Multiple Embedding Providers: OpenAI and Cohere embeddings with easy extensibility
- Advanced Retrieval Strategies: Hybrid search (vector + BM25), multi-query retrieval, and semantic search
- Intelligent Text Splitting: Multiple chunking strategies for optimal retrieval
- RAG + Memory Integration: Combine document retrieval with conversation history for enhanced context
- Decorator-Based API:
@Embeddable,@SemanticSearch,@HybridSearchfor declarative RAG - Production-Ready: Battle-tested patterns with proper error handling and TypeScript support
Architecture
graph TD A["Documents"] --> B["Text Splitter"] B --> C["Chunks"] C --> D["Embedding Provider<br/>(OpenAI, Cohere)"] D --> E["Vector Embeddings"] E --> F["Vector Store<br/>(Memory, Pinecone, Qdrant, etc.)"] G["User Query"] --> H["Embedding Provider"] H --> I["Query Vector"] I --> J["Retrieval Strategy<br/>(Semantic, Hybrid, Multi-Query)"] J --> F F --> K["Initial Results"] K --> L["Reranker<br/>(Cohere Rerank 3)"] L --> M["Ranked Results"] style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style C fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style D fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style E fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style F fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
Key Components
- RAG Pipeline: Orchestrates document indexing, query processing, and result retrieval
- Vector Stores: Pluggable storage backends for embeddings and documents
- Embedding Providers: Generate vector embeddings from text
- Retrieval Strategies: Advanced search algorithms (hybrid, multi-query, BM25)
- Rerankers: Pluggable re-ordering of search results for high-precision retrieval (Cohere)
- Text Splitters: Intelligent document chunking for optimal retrieval
- Decorators:
@Embeddable,@SemanticSearch,@HybridSearchfor declarative RAG
Advantages
Vector Store Flexibility
Start with in-memory storage for development, then seamlessly switch to Pinecone, Qdrant, Weaviate, or ChromaDB for production—all with the same API.
Advanced Retrieval
Built-in support for hybrid search (combining vector and keyword search), multi-query retrieval (generating multiple search queries), and BM25 keyword ranking.
Semantic Reranking
High-precision retrieval with built-in support for Cohere Rerank 3. Vector search finds relevant documents, but Rerankers identify the exact needles in the haystack to virtually eliminate hallucinations.
Developer Experience
Decorator-based API means you can add RAG capabilities with a single decorator. No need to manage vector stores, embeddings, or search logic manually.
Production Ready
Proper error handling, TypeScript support, connection pooling, and battle-tested patterns make it ready for production use.
Extensible
Easy to add custom vector stores, embedding providers, or retrieval strategies by implementing simple interfaces.
Installation
# Core RAG package
npm install @hazeljs/rag
# Peer dependencies (choose based on your needs)
npm install openai # For OpenAI embeddings and GraphRAG LLM
# Optional: Vector store clients (install only what you need)
npm install @pinecone-database/pinecone # For Pinecone
npm install @qdrant/js-client-rest # For Qdrant
npm install weaviate-ts-client # For Weaviate
npm install chromadb # For ChromaDB
Optional Document Loader Dependencies:
# For Cohere embeddings
npm install cohere-ai
# For PDF loading (PdfLoader)
npm install pdf-parse
# For Word document loading (DocxLoader)
npm install mammoth
# For CSS-selector web scraping (WebLoader / HtmlFileLoader)
npm install cheerio
Quick Start
Fastest Way: RAGPipeline.from()
The easiest way to get started with RAG using the new factory method:
import { RAGPipeline } from '@hazeljs/rag';
// One-liner setup with sensible defaults
const pipeline = RAGPipeline.from({
provider: 'openai', // or 'cohere'
apiKey: process.env.OPENAI_API_KEY, // Falls back to env var
topK: 5,
llm: async (prompt) => {
// Your LLM function for answer generation
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }],
}),
});
const data = await response.json();
return data.choices[0].message.content;
},
reranker: 'cohere', // NEW: Cohere Rerank 3 for high-precision retrieval
});
// Initialize and use
await pipeline.initialize();
// Universal document ingestion - auto-detects file type
await pipeline.addDocuments([
{ content: 'HazelJS is a TypeScript framework...', metadata: { source: 'docs' } },
]);
// Query the knowledge base
const result = await pipeline.query('What is HazelJS?');
console.log(result.answer);
Universal Document Ingestion
The RAGService provides a universal ingest() method that auto-detects file types:
import { RAGService } from '@hazeljs/rag';
const rag = new RAGService({
vectorStore,
embeddingProvider,
llmFunction,
});
// Auto-detects and loads any supported format
await rag.ingest('./docs/guide.pdf'); // PDF
await rag.ingest('./data/faq.csv'); // CSV
await rag.ingest('https://example.com/page'); // Web page
await rag.ingest('./knowledge-base/'); // Entire directory
// Then query
const { answer, sources } = await rag.ask('What is the pricing?');
Manual Setup (Advanced)
For more control, set up the pipeline manually:
import {
RAGPipeline,
OpenAIEmbeddings,
MemoryVectorStore
} from '@hazeljs/rag';
// Setup embeddings provider
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536,
});
// Create vector store
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
// Create RAG pipeline
const rag = new RAGPipeline({
vectorStore,
embeddingProvider: embeddings,
topK: 5, // Return top 5 results
});
await rag.initialize();
// Index documents
await rag.addDocuments([
{
content: 'HazelJS is a modern TypeScript framework for building scalable applications.',
metadata: { category: 'framework', source: 'docs' },
},
{
content: 'The RAG package provides semantic search and vector database integration.',
metadata: { category: 'rag', source: 'docs' },
},
]);
// Query with semantic search
const results = await rag.search('What is HazelJS?', { topK: 3 });
console.log('Search Results:');
results.forEach((result, index) => {
console.log(`${index + 1}. ${result.content}`);
console.log(` Score: ${result.score}`);
console.log(` Metadata:`, result.metadata);
});
Document Loaders
Document loaders are the entry point of every RAG pipeline. They read data from any source and return a standardised Document[] array ready for chunking and indexing. Every real-world application needs them immediately — @hazeljs/rag ships 11 built-in loaders covering every common source.
Loader overview
| Loader | Source | Extra install? |
|---|---|---|
TextFileLoader | .txt files | — |
MarkdownFileLoader | .md / .mdx with heading splits and YAML front-matter | — |
JSONFileLoader | .json arrays or objects with textKey / jsonPointer extraction | — |
CSVFileLoader | .csv rows mapped to documents with configurable columns | — |
HtmlFileLoader | .html tag stripping; CSS selectors via cheerio | optional cheerio |
DirectoryLoader | Recursive directory walk, auto-detects loader by extension | — |
PdfLoader | PDFs via pdf-parse; split by page or as one document | npm i pdf-parse |
DocxLoader | Word documents via mammoth; plain text or HTML output | npm i mammoth |
WebLoader | HTTP page scraping; CSS selectors via cheerio; retry/timeout | optional cheerio |
YouTubeTranscriptLoader | YouTube transcript download (no API key); segment by duration | — |
GitHubLoader | GitHub REST API; filter by directory, extension, maxFiles | — |
File loaders
import {
TextFileLoader,
MarkdownFileLoader,
JSONFileLoader,
CSVFileLoader,
HtmlFileLoader,
} from '@hazeljs/rag';
// Plain text — one document per file
const textDocs = await new TextFileLoader({
filePath: './docs/notes.txt',
}).load();
// Markdown — split into one document per heading section
const mdDocs = await new MarkdownFileLoader({
filePath: './docs/guide.md',
splitByHeading: true, // creates one Document per H2/H3 section
parseYamlFrontMatter: true, // front-matter fields become metadata
}).load();
// mdDocs[0].metadata.heading === 'Installation'
// JSON — extract a specific field as the document content
const jsonDocs = await new JSONFileLoader({
filePath: './data/articles.json',
textKey: 'body', // use 'body' field as content
// jsonPointer: '/items', // navigate nested JSON with a JSON Pointer
}).load();
// CSV — map rows to documents; choose which columns become content vs metadata
const csvDocs = await new CSVFileLoader({
filePath: './data/faqs.csv',
contentColumns: ['question', 'answer'],
metadataColumns: ['category'],
}).load();
// HTML — strips all tags, extracts title
const htmlDocs = await new HtmlFileLoader({
filePath: './docs/index.html',
selector: 'main', // optional: only extract content inside <main>
}).load();
DirectoryLoader — bulk ingest
DirectoryLoader walks a directory recursively and automatically delegates each file to the right typed loader. This is the fastest way to ingest a knowledge base from disk:
import { DirectoryLoader } from '@hazeljs/rag';
const docs = await new DirectoryLoader({
dirPath: './knowledge-base',
recursive: true,
// extensions: ['.md', '.txt'], // filter to specific types
// exclude: ['**/node_modules/**'],
}).load();
console.log(`Loaded ${docs.length} documents from ${[...new Set(docs.map(d => d.metadata?.source))].length} files`);
PDF and Word documents
import { PdfLoader, DocxLoader } from '@hazeljs/rag';
// PDF — one document per page or the whole file
const pdfDocs = await new PdfLoader({
filePath: './reports/annual-report.pdf',
splitByPage: true, // each page becomes its own Document
}).load();
// Word document
const wordDocs = await new DocxLoader({
filePath: './contracts/agreement.docx',
outputFormat: 'text', // 'text' (default) or 'html'
}).load();
WebLoader — scrape any URL
import { WebLoader } from '@hazeljs/rag';
// Single URL
const docs = await new WebLoader({
urls: ['https://hazeljs.ai/docs'],
timeout: 10_000,
maxRetries: 3,
// selector: 'article', // optional: CSS selector (requires cheerio)
}).load();
// Multiple URLs in one call
const batchDocs = await new WebLoader({
urls: [
'https://hazeljs.ai/docs/installation',
'https://hazeljs.ai/blog/graphrag',
],
}).load();
YouTubeTranscriptLoader — no API key needed
import { YouTubeTranscriptLoader } from '@hazeljs/rag';
// Works with full URL or just the video ID
const transcriptDocs = await new YouTubeTranscriptLoader({
videoUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
segmentDuration: 60, // group transcript into 60-second chunks
}).load();
// Each doc has metadata: { videoId, startTime, endTime, source }
GitHubLoader — index entire repositories
import { GitHubLoader } from '@hazeljs/rag';
const repoDocs = await new GitHubLoader({
owner: 'hazeljs',
repo: 'hazel',
ref: 'main', // branch or tag
directory: 'docs', // only load this sub-directory
extensions: ['.md', '.mdx'], // only Markdown files
maxFiles: 100,
token: process.env.GITHUB_TOKEN, // optional; avoids 60 req/hr rate limit
}).load();
Custom loaders with @Loader and DocumentLoaderRegistry
Extend BaseDocumentLoader to add any data source. The @Loader decorator registers metadata for auto-detection:
import {
BaseDocumentLoader,
Loader,
DocumentLoaderRegistry,
} from '@hazeljs/rag';
@Loader({
name: 'NotionLoader',
description: 'Loads pages from a Notion database',
extensions: [],
mimeTypes: ['application/vnd.notion'],
})
export class NotionLoader extends BaseDocumentLoader {
constructor(private readonly databaseId: string) {
super();
}
async load() {
const pages = await fetchNotionDatabase(this.databaseId);
return pages.map((page) =>
this.createDocument(page.content, {
source: `notion:${this.databaseId}/${page.id}`,
title: page.title,
lastEdited: page.lastEditedTime,
}),
);
}
}
// Register once at startup — then DirectoryLoader and the registry can use it
DocumentLoaderRegistry.register(
NotionLoader,
(databaseId: string) => new NotionLoader(databaseId),
);
Full ingest pipeline
Putting it all together with the RAG pipeline:
import {
DirectoryLoader,
GitHubLoader,
WebLoader,
RAGPipeline,
OpenAIEmbeddings,
MemoryVectorStore,
RecursiveTextSplitter,
} from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({ apiKey: process.env.OPENAI_API_KEY });
const vectorStore = new MemoryVectorStore(embeddings);
const splitter = new RecursiveTextSplitter({ chunkSize: 800, chunkOverlap: 150 });
const pipeline = new RAGPipeline({ vectorStore, embeddingProvider: embeddings, textSplitter: splitter });
await pipeline.initialize();
// Load from multiple sources
const [localDocs, githubDocs, webDocs] = await Promise.all([
new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load(),
new GitHubLoader({ owner: 'hazeljs', repo: 'hazel', directory: 'docs', extensions: ['.md'] }).load(),
new WebLoader({ urls: ['https://hazeljs.ai/docs'] }).load(),
]);
// Index everything at once
const ids = await pipeline.addDocuments([...localDocs, ...githubDocs, ...webDocs]);
console.log(`Indexed ${ids.length} chunks`);
GraphRAG
GraphRAG extends traditional vector search by building a knowledge graph of entities and relationships extracted from your documents. Instead of searching raw text chunks by cosine similarity, it retrieves structured facts and cross-document themes — answering questions that flat vector search cannot.
See the full GraphRAG Guide for an in-depth walkthrough.
Why GraphRAG?
Traditional RAG retrieves the K most similar text chunks. This works well for narrow questions but fails for:
- Cross-document reasoning — "How do all the components in the system relate to each other?"
- Thematic questions — "What are the main architectural layers of this codebase?"
- Entity-relationship queries — "What does the AgentGraph depend on?"
GraphRAG solves this with two complementary retrieval modes:
| Mode | How it works | Best for |
|---|---|---|
| Local | Finds entities matching the query, traverses K hops in the knowledge graph, assembles entity + relationship context | Specific "what is / how does" questions |
| Global | Ranks LLM-generated community reports by relevance; assembles thematic summaries | Broad "what are the main themes / architecture" questions |
| Hybrid | Runs both in parallel, merges contexts, single LLM synthesis call | Best default — covers both dimensions |
Architecture
graph TD
A["Documents"] --> B["Text Chunks"]
B --> C["Entity Extractor<br/>(LLM)"]
C --> D["Knowledge Graph<br/>(GraphStore)"]
D --> E["Community Detector<br/>(Label Propagation)"]
E --> F["Community Summarizer<br/>(LLM Reports)"]
G["User Query"] --> H{"Search Mode"}
H -->|"local"| I["Seed Entity Lookup"]
H -->|"global"| J["Community Report Ranking"]
H -->|"hybrid"| K["Both in Parallel"]
I --> L["BFS Graph Traversal<br/>(K hops)"]
L --> M["Entity + Relationship Context"]
J --> N["Top-K Report Summaries"]
K --> O["Merged Context"]
M --> P["LLM Synthesis"]
N --> P
O --> P
P --> Q["Answer + Sources"]
style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
style F fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
style Q fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fffBuilding the knowledge graph
import OpenAI from 'openai';
import {
GraphRAGPipeline,
DirectoryLoader,
} from '@hazeljs/rag';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Create the pipeline — provide an LLM function for extraction and synthesis
const graphRag = new GraphRAGPipeline({
llm: async (prompt) => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
temperature: 0,
messages: [{ role: 'user', content: prompt }],
});
return res.choices[0].message.content ?? '';
},
extractionChunkSize: 2000, // max chars per LLM extraction call
generateCommunityReports: true, // produce LLM summaries per community cluster
maxCommunitySize: 15, // split communities larger than this
localSearchDepth: 2, // BFS hops for local search
localSearchTopK: 5, // seed entities per query
globalSearchTopK: 5, // community reports used in global search
});
// Load documents from any source
const docs = await new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load();
// Build extracts entities, builds the graph, detects communities, and writes reports
const stats = await graphRag.build(docs);
console.log(stats);
// {
// documentsProcessed: 12,
// entitiesExtracted: 47,
// relationshipsExtracted: 63,
// communitiesDetected: 8,
// communityReportsGenerated: 8,
// duration: 18400,
// }
Local search — entity-centric
Best for specific, factual questions about named concepts, technologies, or processes:
const result = await graphRag.search('How does HazelJS dependency injection work?', {
mode: 'local',
depth: 2, // traverse up to 2 hops from seed entities
topK: 5, // start from 5 seed entities
});
console.log(result.answer);
// "HazelJS uses constructor injection. When the IoC container resolves
// a @Service(), it reads TypeScript metadata to identify constructor
// parameters and injects resolved instances automatically..."
console.log(result.entities.map(e => `${e.name} [${e.type}]`));
// ['Dependency Injection [CONCEPT]', 'IoC Container [TECHNOLOGY]',
// '@Service [FEATURE]', 'HazelJS [TECHNOLOGY]', ...]
console.log(result.relationships.map(r => `${r.type}: ${r.description}`));
// ['USES: HazelJS uses constructor injection pattern', ...]
Global search — community reports
Best for broad questions about themes, architecture, or the overall scope of a knowledge base:
const result = await graphRag.search(
'What are the main architectural layers of the HazelJS framework?',
{
mode: 'global',
topK: 5, // include top 5 community reports by relevance
},
);
console.log(result.communities[0]);
// {
// communityId: 'community_0',
// title: 'HazelJS Core Infrastructure Layer',
// summary: 'This community represents the foundational layer of HazelJS...',
// findings: ['HazelJS Core provides HTTP and DI foundation', ...],
// rating: 9,
// }
Hybrid search — best default
Runs local and global in parallel and merges their contexts before a single LLM synthesis call:
const result = await graphRag.search(
'What vector stores does @hazeljs/rag support and how do I swap them?',
{
mode: 'hybrid', // default when mode is omitted
includeGraph: true, // include entities + relationships in result
includeCommunities: true,
},
);
console.log(`${result.mode} search in ${result.duration}ms`);
console.log(`Entities found: ${result.entities.length}`);
console.log(`Communities used: ${result.communities.length}`);
Entity and relationship types
The LLM extractor maps every concept to one of these canonical types, making the graph consistent and queryable:
Entity types: CONCEPT · TECHNOLOGY · PERSON · ORGANIZATION · PROCESS · FEATURE · EVENT · LOCATION · OTHER
Relationship types: USES · IMPLEMENTS · CREATED_BY · PART_OF · DEPENDS_ON · RELATED_TO · EXTENDS · CONFIGURES · TRIGGERS · PRODUCES · REPLACES · OTHER
Incremental updates
Add new documents to an existing graph without rebuilding from scratch:
// Add a new batch of documents to the existing graph
const updateStats = await graphRag.addDocuments(newDocs);
// Graph re-runs community detection and regenerates reports after each batch
Inspect the graph
The full knowledge graph is available for visualisation (D3.js, Cytoscape.js, etc.):
const graph = graphRag.getGraph();
// Entities
console.log([...graph.entities.values()].slice(0, 3));
// [{ id, name, type, description, sourceDocIds }, ...]
// Relationships
console.log([...graph.relationships.values()].slice(0, 3));
// [{ id, sourceId, targetId, type, description, weight }, ...]
// Community reports
console.log([...graph.communityReports.values()].map(r => r.title));
// ['HazelJS Core DI System', 'RAG Pipeline & Vector Stores', ...]
// Statistics
const stats = graphRag.getStats();
console.log(stats.entityTypeBreakdown);
// { TECHNOLOGY: 14, CONCEPT: 12, FEATURE: 9, PROCESS: 7, ... }
console.log(stats.topEntities.slice(0, 3));
// [{ name: 'HazelJS', connections: 12 }, ...]
GraphRAG vs traditional RAG
| Traditional RAG | GraphRAG | |
|---|---|---|
| Storage | Flat vector index | Knowledge graph + vector index |
| Retrieval unit | Text chunk | Entity + relationships + community |
| Cross-document reasoning | Limited | Native |
| Broad thematic questions | Poor | Excellent (community reports) |
| Specific entity questions | Good | Excellent (BFS traversal) |
| Setup cost | Low | Medium (LLM extraction pass) |
| Token cost per query | Low | Medium |
| Best use case | Q&A over focused docs | Multi-document knowledge bases |
Vector Stores
The RAG package supports 5 vector store implementations with a unified interface.
Memory Vector Store (Development)
In-memory storage with no external dependencies. Perfect for development and testing.
Advantages:
- Zero setup required
- Extremely fast
- No external dependencies
- Great for testing and CI/CD
Limitations:
- Data lost on restart
- Limited to available memory
- Not suitable for production
import { MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
// Use it
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });
Pinecone Vector Store (Production, Serverless)
Fully managed, serverless vector database with automatic scaling.
Advantages:
- Fully managed (no infrastructure)
- Auto-scaling
- Global distribution
- High performance
- Excellent for serverless deployments
Limitations:
- Paid service (free tier available)
- Network latency for self-hosted alternatives
import { PineconeVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new PineconeVectorStore(embeddings, {
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENVIRONMENT,
indexName: 'my-knowledge-base',
});
await vectorStore.initialize();
// Same API as Memory store
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });
Setup:
- Sign up at pinecone.io
- Create an index with dimension matching your embeddings (1536 for OpenAI text-embedding-3-small)
- Get your API key and environment from the dashboard
Qdrant Vector Store (High-Performance, Self-Hosted)
Rust-based vector database optimized for speed and efficiency.
Advantages:
- Extremely fast (Rust-based)
- Advanced filtering capabilities
- Self-hosted (full control)
- Open-source
- Cost-effective for large datasets
Limitations:
- Requires infrastructure management
- Setup complexity
import { QdrantVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new QdrantVectorStore(embeddings, {
url: process.env.QDRANT_URL || 'http://localhost:6333',
collectionName: 'my-knowledge-base',
});
await vectorStore.initialize();
Setup with Docker:
docker run -p 6333:6333 qdrant/qdrant
Weaviate Vector Store (GraphQL, Flexible)
Open-source vector database with GraphQL API and advanced features.
Advantages:
- GraphQL API
- Flexible schema
- Built-in vectorization
- Hybrid search support
- Multi-tenancy
Limitations:
- Requires infrastructure
- Learning curve for GraphQL
import { WeaviateVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new WeaviateVectorStore(embeddings, {
host: process.env.WEAVIATE_HOST || 'http://localhost:8080',
className: 'MyKnowledgeBase',
});
await vectorStore.initialize();
Setup with Docker:
docker run -p 8080:8080 semitechnologies/weaviate:latest
ChromaDB Vector Store (Prototyping, Embedded)
Lightweight, embeddable vector database perfect for prototyping.
Advantages:
- Easy setup
- Lightweight
- Can run embedded or as a server
- Great for prototyping
- Python and JavaScript support
Limitations:
- Less mature than alternatives
- Limited scalability for very large datasets
import { ChromaVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new ChromaVectorStore(embeddings, {
url: process.env.CHROMA_URL || 'http://localhost:8000',
collectionName: 'my-knowledge-base',
});
await vectorStore.initialize();
// ChromaDB-specific features
const stats = await vectorStore.getStats();
console.log('Collection size:', stats.count);
const preview = await vectorStore.peek(5);
console.log('First 5 documents:', preview);
Setup with Docker:
docker run -p 8000:8000 chromadb/chroma
Vector Store Comparison
| Feature | Memory | Pinecone | Qdrant | Weaviate | ChromaDB |
|---|---|---|---|---|---|
| Setup | None | API Key | Docker | Docker | Docker |
| Persistence | ❌ | ✅ | ✅ | ✅ | ✅ |
| Scalability | Low | High | High | High | Medium |
| Performance | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Cost | Free | Paid | Free (OSS) | Free (OSS) | Free (OSS) |
| Best For | Dev/Test | Production | High-perf | GraphQL | Prototyping |
| Metadata Filtering | ✅ | ✅ | ✅ | ✅ | ✅ |
| Hybrid Search | ❌ | ✅ | ✅ | ✅ | ❌ |
| Multi-tenancy | ❌ | ✅ | ✅ | ✅ | ❌ |
Embedding Providers
Embedding providers convert text into vector representations for semantic search.
OpenAI Embeddings
State-of-the-art embeddings from OpenAI with multiple model options.
Models:
text-embedding-3-small: 1536 dimensions, fast and cost-effectivetext-embedding-3-large: 3072 dimensions, highest qualitytext-embedding-ada-002: Legacy model, 1536 dimensions
import { OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536, // Optional: reduce dimensions for faster search
});
// Embed single text
const vector = await embeddings.embed('Hello world');
console.log('Vector dimensions:', vector.length);
// Embed multiple texts (batch)
const vectors = await embeddings.embedBatch([
'First document',
'Second document',
'Third document',
]);
Cohere Embeddings
Multilingual embeddings from Cohere with excellent performance.
Models:
embed-english-v3.0: English-only, high qualityembed-multilingual-v3.0: 100+ languagesembed-english-light-v3.0: Faster, smaller model
import { CohereEmbeddings } from '@hazeljs/rag';
const embeddings = new CohereEmbeddings({
apiKey: process.env.COHERE_API_KEY,
model: 'embed-english-v3.0',
inputType: 'search_document', // or 'search_query'
});
const vector = await embeddings.embed('Hello world');
Retrieval Strategies
Advanced search strategies for better results.
Hybrid Search
Combines vector similarity search with BM25 keyword search for best results.
graph LR A["Query"] --> B["Vector Search<br/>(Semantic)"] A --> C["BM25 Search<br/>(Keyword)"] B --> D["Score Fusion"] C --> D D --> E["Ranked Results"] style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff style E fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
How it works:
- Performs vector similarity search (semantic understanding)
- Performs BM25 keyword search (exact term matching)
- Normalizes scores from both methods
- Combines scores with configurable weights
- Returns re-ranked results
import {
HybridSearchRetrieval,
MemoryVectorStore,
OpenAIEmbeddings
} from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
const hybridSearch = new HybridSearchRetrieval(vectorStore, {
vectorWeight: 0.7, // 70% weight to semantic search
keywordWeight: 0.3, // 30% weight to keyword search
topK: 10,
});
// Add documents
await vectorStore.addDocuments(documents);
// Search with hybrid strategy
const results = await hybridSearch.search('machine learning algorithms', {
topK: 5,
});
Multi-Query Retrieval
Generates multiple query variations using an LLM to improve recall.
graph TD A["Original Query"] --> B["LLM Query Generator"] B --> C["Query Variation 1"] B --> D["Query Variation 2"] B --> E["Query Variation 3"] C --> F["Vector Search"] D --> F E --> F F --> G["Deduplicate & Rank"] G --> H["Final Results"] style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style D fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style E fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff style G fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff style H fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
How it works:
- Takes user's original question
- Uses LLM to generate multiple variations
- Searches with each variation
- Deduplicates results
- Re-ranks by frequency and average score
import {
MultiQueryRetrieval,
MemoryVectorStore,
OpenAIEmbeddings
} from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
const multiQuery = new MultiQueryRetrieval(vectorStore, {
llmApiKey: process.env.OPENAI_API_KEY,
numQueries: 3, // Generate 3 query variations
topK: 10,
});
// Add documents
await vectorStore.addDocuments(documents);
// Search with multiple query variations
const results = await multiQuery.search('How do I deploy my app?', {
topK: 5,
});
Text Splitters
Intelligent document chunking for optimal retrieval.
Recursive Character Text Splitter
Splits text recursively by trying different separators (paragraphs, sentences, words).
import { RecursiveCharacterTextSplitter } from '@hazeljs/rag';
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000, // Target chunk size in characters
chunkOverlap: 200, // Overlap between chunks for context
separators: ['\n\n', '\n', '. ', ' '], // Try these in order
});
const chunks = await splitter.splitText(longDocument);
console.log(`Split into ${chunks.length} chunks`);
chunks.forEach((chunk, i) => {
console.log(`Chunk ${i + 1}: ${chunk.substring(0, 50)}...`);
});
Character Text Splitter
Simple character-based splitting with overlap.
import { CharacterTextSplitter } from '@hazeljs/rag';
const splitter = new CharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 50,
separator: '\n',
});
const chunks = await splitter.splitText(document);
Token Text Splitter
Splits by token count (useful for LLM context windows).
import { TokenTextSplitter } from '@hazeljs/rag';
const splitter = new TokenTextSplitter({
chunkSize: 512, // Max tokens per chunk
chunkOverlap: 50, // Overlap in tokens
encodingName: 'cl100k_base', // OpenAI encoding
});
const chunks = await splitter.splitText(document);
Decorators
Declarative RAG with decorators.
@Embeddable
Mark a class as embeddable for automatic vector storage.
import { Embeddable, Embedded } from '@hazeljs/rag';
@Embeddable({
vectorStore: 'memory',
embeddingProvider: 'openai',
})
class Article {
@Embedded()
title: string;
@Embedded()
content: string;
metadata: {
author: string;
date: Date;
};
}
@SemanticSearch
Add semantic search to a method.
import { Controller, Get } from '@hazeljs/common';
import { SemanticSearch } from '@hazeljs/rag';
@Controller('search')
class SearchController {
@Get()
@SemanticSearch({
vectorStore: 'pinecone',
topK: 5,
})
async search(@Query('q') query: string) {
// Results automatically injected
return { query, results: this.searchResults };
}
}
@HybridSearch
Add hybrid search (vector + keyword) to a method.
import { Controller, Get } from '@hazeljs/common';
import { HybridSearch } from '@hazeljs/rag';
@Controller('search')
class SearchController {
@Get('hybrid')
@HybridSearch({
vectorStore: 'qdrant',
vectorWeight: 0.7,
keywordWeight: 0.3,
topK: 10,
})
async hybridSearch(@Query('q') query: string) {
return { query, results: this.searchResults };
}
}
Advanced Retrieval Methods
The RAGService provides several advanced retrieval methods beyond basic semantic search:
Multi-Query Retrieval
Generate multiple search queries from a single question to improve recall:
// Generates 3 variations of the query and combines results
const results = await rag.multiQuery('What is HazelJS?', 3);
Conversational RAG
Maintain conversation context across multiple turns:
const sessionId = 'user-123';
// First question
const result1 = await rag.chat('What is HazelJS?', sessionId);
// Follow-up question - automatically uses conversation history
const result2 = await rag.chat('How do I install it?', sessionId);
// Clear conversation when done
rag.clearChat(sessionId);
Hybrid Search
Combine vector similarity with BM25 keyword search:
const results = await rag.hybridSearch('TypeScript framework', {
topK: 10,
vectorWeight: 0.7, // 70% vector similarity
keywordWeight: 0.3, // 30% keyword matching
});
Context Compression
Remove redundant and low-relevance results:
const results = await rag.search('query', { topK: 20 });
const compressed = await rag.compress(results, 'query');
// Returns ~5 most relevant, non-redundant results
Reranking
Re-score results using LLM-based cross-encoder scoring:
const results = await rag.search('query', { topK: 20 });
const reranked = await rag.rerank(results, 'query', 5);
// Returns top 5 after LLM reranking
Ensemble Retrieval
Combine multiple retrieval strategies with weighted fusion:
import { RetrievalStrategy } from '@hazeljs/rag';
const results = await rag.ensemble(
'query',
[RetrievalStrategy.SIMILARITY, RetrievalStrategy.HYBRID, RetrievalStrategy.MMR],
[0.5, 0.3, 0.2] // Weights for each strategy
);
Error Handling
HazelJS provides typed error classes for robust RAG error handling:
import { RAGService, RAGError, RAGErrorCode } from '@hazeljs/rag';
try {
await rag.ingest('./documents/guide.pdf');
} catch (error) {
if (error instanceof RAGError) {
switch (error.code) {
case RAGErrorCode.MISSING_DEPENDENCY:
console.error('Install required package:', error.message);
// Error includes package name and install command
break;
case RAGErrorCode.LOADER_ERROR:
console.error('Failed to load document:', error.message);
break;
case RAGErrorCode.EMBEDDING_ERROR:
console.error('Embedding generation failed:', error.message);
// Retry or use fallback
break;
case RAGErrorCode.VECTOR_STORE_ERROR:
console.error('Vector store error:', error.message);
break;
default:
console.error('RAG error:', error.message);
}
}
throw error;
}
Available Error Codes
VECTOR_STORE_ERROR- Vector database operation failedEMBEDDING_ERROR- Embedding generation failedLOADER_ERROR- Document loading failedSPLITTER_ERROR- Text splitting failedLLM_GENERATION_ERROR- Answer generation failedINDEX_ERROR- Document indexing failedRETRIEVAL_ERROR- Search/retrieval failedUNSUPPORTED_FORMAT- File format not supportedMISSING_DEPENDENCY- Required package not installedCONFIGURATION_ERROR- Invalid configuration
Debugging
Enable debug logging with the HAZELJS_DEBUG environment variable:
HAZELJS_DEBUG=true npm start
Debug output for RAG operations:
2024-03-23T12:00:00.000Z [hazeljs:rag] ingest start source=./docs/guide.pdf
2024-03-23T12:00:01.200Z [hazeljs:rag] ingest loaded docs=15
2024-03-23T12:00:02.500Z [hazeljs:rag] ingest complete ids=15
2024-03-23T12:00:03.000Z [hazeljs:rag] search query=What is HazelJS? strategy=similarity
2024-03-23T12:00:03.300Z [hazeljs:rag] search results=5
2024-03-23T12:00:04.100Z [hazeljs:rag] ask query=What is HazelJS?
2024-03-23T12:00:05.500Z [hazeljs:rag] ask complete answer_len=245 sources=5
Best Practices
Choose the Right Vector Store
- Development: Use
MemoryVectorStorefor fast iteration - Production (Serverless): Use
PineconeVectorStorefor zero infrastructure - Production (Self-Hosted): Use
QdrantVectorStorefor performance and cost - Prototyping: Use
ChromaVectorStorefor quick setup
Optimize Chunk Size
// For Q&A: Smaller chunks (200-500 chars)
const qaChunks = new RecursiveCharacterTextSplitter({
chunkSize: 300,
chunkOverlap: 50,
});
// For summarization: Larger chunks (1000-2000 chars)
const summaryChunks = new RecursiveCharacterTextSplitter({
chunkSize: 1500,
chunkOverlap: 200,
});
Use Metadata Filtering
// Add metadata when indexing
await vectorStore.addDocuments([
{
content: 'Document content',
metadata: {
category: 'technical',
date: '2024-01-01',
author: 'John Doe',
},
},
]);
// Filter during search
const results = await vectorStore.search('query', {
topK: 5,
filter: {
category: 'technical',
date: { $gte: '2024-01-01' },
},
});
Implement Caching
import { CacheService } from '@hazeljs/cache';
class RAGService {
constructor(
private vectorStore: VectorStore,
private cache: CacheService,
) {}
async search(query: string) {
const cacheKey = `search:${query}`;
// Check cache first
const cached = await this.cache.get(cacheKey);
if (cached) return cached;
// Perform search
const results = await this.vectorStore.search(query);
// Cache results
await this.cache.set(cacheKey, results, 3600); // 1 hour
return results;
}
}
Monitor Performance
async function searchWithMetrics(query: string) {
const start = Date.now();
try {
const results = await vectorStore.search(query);
const duration = Date.now() - start;
console.log(`Search completed in ${duration}ms`);
console.log(`Found ${results.length} results`);
return results;
} catch (error) {
console.error('Search failed:', error);
throw error;
}
}
Troubleshooting
Connection Errors
// Add retry logic
async function connectWithRetry(vectorStore: VectorStore, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
await vectorStore.initialize();
console.log('Connected successfully');
return;
} catch (error) {
console.log(`Connection attempt ${i + 1} failed`);
if (i === maxRetries - 1) throw error;
await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
}
}
}
Dimension Mismatch
// Ensure embedding dimensions match vector store configuration
// OpenAI text-embedding-3-small = 1536 dimensions
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536, // Must match index
});
Docker Setup for Self-Hosted Stores
# Qdrant
docker run -p 6333:6333 qdrant/qdrant
# Weaviate
docker run -p 8080:8080 semitechnologies/weaviate:latest
# ChromaDB
docker run -p 8000:8000 chromadb/chroma
Low Search Quality
- Increase chunk overlap: More context between chunks
- Adjust chunk size: Smaller chunks for precise retrieval
- Use hybrid search: Combine semantic and keyword search
- Add metadata filtering: Narrow down search scope
- Try multi-query retrieval: Generate multiple search variations
High Latency
- Use batch operations: Process multiple documents at once
- Cache embeddings: Store embeddings with documents
- Optimize topK: Request fewer results
- Use production vector stores: Pinecone, Qdrant, or Weaviate
- Enable connection pooling: For self-hosted databases
Memory System
The RAG package includes a powerful memory system for building context-aware AI applications. See the Memory System Guide for complete documentation.
Quick Example
import {
RAGPipelineWithMemory,
MemoryManager,
HybridMemory,
BufferMemory,
VectorMemory,
} from '@hazeljs/rag';
// Setup memory
const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);
const memoryManager = new MemoryManager(hybridMemory, {
maxConversationLength: 20,
summarizeAfter: 50,
entityExtraction: true,
});
// Create RAG with memory
const rag = new RAGPipelineWithMemory(
{ vectorStore, embeddingProvider: embeddings },
memoryManager,
llmFunction
);
// Query with conversation context
const response = await rag.queryWithMemory(
'What did we discuss about pricing?',
'session-123',
'user-456'
);
console.log(response.answer);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);
Memory Features
- Conversation Memory: Track multi-turn conversations with auto-summarization
- Entity Memory: Remember people, companies, and relationships
- Fact Storage: Store and recall facts semantically
- Working Memory: Temporary context for current tasks
- Hybrid Storage: Fast buffer + persistent vector storage
- Semantic Search: Find relevant memories using embeddings
Learn more in the Memory System Guide.
What's Next?
- Read the Document Loaders Guide for deep dives on every loader
- Explore the GraphRAG Guide for knowledge graph retrieval
- Explore the Memory System for context-aware AI
- Learn about AI Package for LLM integration with RAG
- Explore Caching to optimize RAG performance
- Check out Config for managing API keys
- Read the Vector Stores Guide for detailed setup
- See RAG Patterns for advanced techniques
- Compare RAG vs Agentic RAG to choose the right approach
- Explore Agentic RAG for autonomous retrieval strategies
Related Resources
- AI Package – LLM providers for RAG completion
- Eval Package – Precision/recall@k, golden datasets, and CI checks for retrieval quality
- Agent Package – Multi-agent RAG with autonomous retrieval
- Memory Package – Conversation and entity memory
- Prompts Package – Prompt templates for RAG queries
- Data Package – Data processing and ETL pipelines
- hazeljs-rag-documents-starter – Full RAG example with GraphRAG, multiple loaders, and vector stores
Recipes
Recipe: Ingest Documents and Search
// File: src/knowledge/knowledge.service.ts
import { Service } from '@hazeljs/core';
import { RAGPipeline, FileLoader } from '@hazeljs/rag';
@Service()
export class KnowledgeService {
constructor(private readonly rag: RAGPipeline) {}
async ingestFolder(folderPath: string) {
const loader = new FileLoader({ directory: folderPath, extensions: ['.md', '.txt', '.pdf'] });
const documents = await loader.load();
await this.rag.ingest(documents);
return { ingested: documents.length };
}
async search(query: string) {
const results = await this.rag.search(query, { topK: 5 });
return results.map(r => ({
content: r.content,
score: r.score,
source: r.metadata.source,
}));
}
}
Recipe: RAG-Powered Q&A Endpoint
// File: src/qa/qa.controller.ts
import { Controller, Post, Body } from '@hazeljs/core';
import { RAGPipeline } from '@hazeljs/rag';
import { AIEnhancedService } from '@hazeljs/ai';
@Controller('qa')
export class QAController {
constructor(
private readonly rag: RAGPipeline,
private readonly ai: AIEnhancedService,
) {}
@Post()
async ask(@Body('question') question: string) {
const context = await this.rag.search(question, { topK: 5 });
const contextText = context.map(r => r.content).join('\n\n');
const answer = await this.ai
.chat(question)
.system(`Answer based on the following context:\n\n${contextText}`)
.model('gpt-4-turbo-preview')
.text();
return { answer, sources: context.map(r => r.metadata.source) };
}
}
Recipe: Hybrid Search with Reranking
// File: src/search/search.service.ts
import { Service } from '@hazeljs/core';
import { RAGPipeline } from '@hazeljs/rag';
@Service()
export class SearchService {
constructor(private readonly rag: RAGPipeline) {}
async hybridSearch(query: string) {
const results = await this.rag.search(query, {
topK: 10,
strategy: 'hybrid',
semanticWeight: 0.7,
keywordWeight: 0.3,
rerank: true,
rerankTopK: 5,
});
return results;
}
}
API Reference
For complete API documentation, see the RAG API Reference.