RAG Package
The @hazeljs/rag package provides a comprehensive Retrieval-Augmented Generation (RAG) implementation with document loaders, knowledge graph retrieval (GraphRAG), memory management, and semantic search — everything you need to build production-grade AI knowledge bases.
Purpose
Building RAG applications requires integrating vector databases, managing embeddings, loading documents from diverse sources, implementing search strategies, and maintaining conversation context. The @hazeljs/rag package solves all of this in one place:
- 11 Document Loaders: TXT, Markdown, JSON, CSV, HTML, PDF, DOCX, web scraping, YouTube transcripts, GitHub repos, and inline text — all with a unified
BaseDocumentLoaderAPI - GraphRAG: Knowledge graph-based retrieval that extracts entities and relationships, detects communities, and enables entity-centric (local) and thematic (global) search that outperforms flat cosine similarity
- 5 Vector Store Implementations: Memory, Pinecone, Qdrant, Weaviate, and ChromaDB with a unified interface
- Memory System: Conversation tracking, entity memory, fact storage, and working memory for context-aware AI
- Multiple Embedding Providers: OpenAI and Cohere embeddings with easy extensibility
- Advanced Retrieval Strategies: Hybrid search (vector + BM25), multi-query retrieval, and semantic search
- Intelligent Text Splitting: Multiple chunking strategies for optimal retrieval
- RAG + Memory Integration: Combine document retrieval with conversation history for enhanced context
- Decorator-Based API:
@Embeddable,@SemanticSearch,@HybridSearchfor declarative RAG - Production-Ready: Battle-tested patterns with proper error handling and TypeScript support
Architecture
graph TD A["Documents"] --> B["Text Splitter"] B --> C["Chunks"] C --> D["Embedding Provider<br/>(OpenAI, Cohere)"] D --> E["Vector Embeddings"] E --> F["Vector Store<br/>(Memory, Pinecone, Qdrant, etc.)"] G["User Query"] --> H["Embedding Provider"] H --> I["Query Vector"] I --> J["Retrieval Strategy<br/>(Semantic, Hybrid, Multi-Query)"] J --> F F --> K["Ranked Results"] style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style C fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style D fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style E fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style F fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
Key Components
- RAG Pipeline: Orchestrates document indexing, query processing, and result retrieval
- Vector Stores: Pluggable storage backends for embeddings and documents
- Embedding Providers: Generate vector embeddings from text
- Retrieval Strategies: Advanced search algorithms (hybrid, multi-query, BM25)
- Text Splitters: Intelligent document chunking for optimal retrieval
- Decorators:
@Embeddable,@SemanticSearch,@HybridSearchfor declarative RAG
Advantages
Vector Store Flexibility
Start with in-memory storage for development, then seamlessly switch to Pinecone, Qdrant, Weaviate, or ChromaDB for production—all with the same API.
Advanced Retrieval
Built-in support for hybrid search (combining vector and keyword search), multi-query retrieval (generating multiple search queries), and BM25 keyword ranking.
Developer Experience
Decorator-based API means you can add RAG capabilities with a single decorator. No need to manage vector stores, embeddings, or search logic manually.
Production Ready
Proper error handling, TypeScript support, connection pooling, and battle-tested patterns make it ready for production use.
Extensible
Easy to add custom vector stores, embedding providers, or retrieval strategies by implementing simple interfaces.
Installation
# Core RAG package
npm install @hazeljs/rag
# Peer dependencies (choose based on your needs)
npm install openai # For OpenAI embeddings and GraphRAG LLM
# Optional: Vector store clients (install only what you need)
npm install @pinecone-database/pinecone # For Pinecone
npm install @qdrant/js-client-rest # For Qdrant
npm install weaviate-ts-client # For Weaviate
npm install chromadb # For ChromaDB
Optional Document Loader Dependencies:
# For Cohere embeddings
npm install cohere-ai
# For PDF loading (PdfLoader)
npm install pdf-parse
# For Word document loading (DocxLoader)
npm install mammoth
# For CSS-selector web scraping (WebLoader / HtmlFileLoader)
npm install cheerio
Quick Start
Basic RAG Pipeline
The simplest way to get started with RAG:
import {
RAGPipeline,
OpenAIEmbeddings,
MemoryVectorStore
} from '@hazeljs/rag';
// Setup embeddings provider
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536,
});
// Create vector store
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
// Create RAG pipeline
const rag = new RAGPipeline({
vectorStore,
embeddingProvider: embeddings,
topK: 5, // Return top 5 results
});
await rag.initialize();
// Index documents
await rag.addDocuments([
{
content: 'HazelJS is a modern TypeScript framework for building scalable applications.',
metadata: { category: 'framework', source: 'docs' },
},
{
content: 'The RAG package provides semantic search and vector database integration.',
metadata: { category: 'rag', source: 'docs' },
},
]);
// Query with semantic search
const results = await rag.search('What is HazelJS?', { topK: 3 });
console.log('Search Results:');
results.forEach((result, index) => {
console.log(`${index + 1}. ${result.content}`);
console.log(` Score: ${result.score}`);
console.log(` Metadata:`, result.metadata);
});
Document Loaders
Document loaders are the entry point of every RAG pipeline. They read data from any source and return a standardised Document[] array ready for chunking and indexing. Every real-world application needs them immediately — @hazeljs/rag ships 11 built-in loaders covering every common source.
Loader overview
| Loader | Source | Extra install? |
|---|---|---|
TextFileLoader | .txt files | — |
MarkdownFileLoader | .md / .mdx with heading splits and YAML front-matter | — |
JSONFileLoader | .json arrays or objects with textKey / jsonPointer extraction | — |
CSVFileLoader | .csv rows mapped to documents with configurable columns | — |
HtmlFileLoader | .html tag stripping; CSS selectors via cheerio | optional cheerio |
DirectoryLoader | Recursive directory walk, auto-detects loader by extension | — |
PdfLoader | PDFs via pdf-parse; split by page or as one document | npm i pdf-parse |
DocxLoader | Word documents via mammoth; plain text or HTML output | npm i mammoth |
WebLoader | HTTP page scraping; CSS selectors via cheerio; retry/timeout | optional cheerio |
YouTubeTranscriptLoader | YouTube transcript download (no API key); segment by duration | — |
GitHubLoader | GitHub REST API; filter by directory, extension, maxFiles | — |
File loaders
import {
TextFileLoader,
MarkdownFileLoader,
JSONFileLoader,
CSVFileLoader,
HtmlFileLoader,
} from '@hazeljs/rag';
// Plain text — one document per file
const textDocs = await new TextFileLoader({
filePath: './docs/notes.txt',
}).load();
// Markdown — split into one document per heading section
const mdDocs = await new MarkdownFileLoader({
filePath: './docs/guide.md',
splitByHeading: true, // creates one Document per H2/H3 section
parseYamlFrontMatter: true, // front-matter fields become metadata
}).load();
// mdDocs[0].metadata.heading === 'Installation'
// JSON — extract a specific field as the document content
const jsonDocs = await new JSONFileLoader({
filePath: './data/articles.json',
textKey: 'body', // use 'body' field as content
// jsonPointer: '/items', // navigate nested JSON with a JSON Pointer
}).load();
// CSV — map rows to documents; choose which columns become content vs metadata
const csvDocs = await new CSVFileLoader({
filePath: './data/faqs.csv',
contentColumns: ['question', 'answer'],
metadataColumns: ['category'],
}).load();
// HTML — strips all tags, extracts title
const htmlDocs = await new HtmlFileLoader({
filePath: './docs/index.html',
selector: 'main', // optional: only extract content inside <main>
}).load();
DirectoryLoader — bulk ingest
DirectoryLoader walks a directory recursively and automatically delegates each file to the right typed loader. This is the fastest way to ingest a knowledge base from disk:
import { DirectoryLoader } from '@hazeljs/rag';
const docs = await new DirectoryLoader({
dirPath: './knowledge-base',
recursive: true,
// extensions: ['.md', '.txt'], // filter to specific types
// exclude: ['**/node_modules/**'],
}).load();
console.log(`Loaded ${docs.length} documents from ${[...new Set(docs.map(d => d.metadata?.source))].length} files`);
PDF and Word documents
import { PdfLoader, DocxLoader } from '@hazeljs/rag';
// PDF — one document per page or the whole file
const pdfDocs = await new PdfLoader({
filePath: './reports/annual-report.pdf',
splitByPage: true, // each page becomes its own Document
}).load();
// Word document
const wordDocs = await new DocxLoader({
filePath: './contracts/agreement.docx',
outputFormat: 'text', // 'text' (default) or 'html'
}).load();
WebLoader — scrape any URL
import { WebLoader } from '@hazeljs/rag';
// Single URL
const docs = await new WebLoader({
urls: ['https://hazeljs.com/docs'],
timeout: 10_000,
maxRetries: 3,
// selector: 'article', // optional: CSS selector (requires cheerio)
}).load();
// Multiple URLs in one call
const batchDocs = await new WebLoader({
urls: [
'https://hazeljs.com/docs/installation',
'https://hazeljs.com/blog/graphrag',
],
}).load();
YouTubeTranscriptLoader — no API key needed
import { YouTubeTranscriptLoader } from '@hazeljs/rag';
// Works with full URL or just the video ID
const transcriptDocs = await new YouTubeTranscriptLoader({
videoUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
segmentDuration: 60, // group transcript into 60-second chunks
}).load();
// Each doc has metadata: { videoId, startTime, endTime, source }
GitHubLoader — index entire repositories
import { GitHubLoader } from '@hazeljs/rag';
const repoDocs = await new GitHubLoader({
owner: 'hazeljs',
repo: 'hazel',
ref: 'main', // branch or tag
directory: 'docs', // only load this sub-directory
extensions: ['.md', '.mdx'], // only Markdown files
maxFiles: 100,
token: process.env.GITHUB_TOKEN, // optional; avoids 60 req/hr rate limit
}).load();
Custom loaders with @Loader and DocumentLoaderRegistry
Extend BaseDocumentLoader to add any data source. The @Loader decorator registers metadata for auto-detection:
import {
BaseDocumentLoader,
Loader,
DocumentLoaderRegistry,
} from '@hazeljs/rag';
@Loader({
name: 'NotionLoader',
description: 'Loads pages from a Notion database',
extensions: [],
mimeTypes: ['application/vnd.notion'],
})
export class NotionLoader extends BaseDocumentLoader {
constructor(private readonly databaseId: string) {
super();
}
async load() {
const pages = await fetchNotionDatabase(this.databaseId);
return pages.map((page) =>
this.createDocument(page.content, {
source: `notion:${this.databaseId}/${page.id}`,
title: page.title,
lastEdited: page.lastEditedTime,
}),
);
}
}
// Register once at startup — then DirectoryLoader and the registry can use it
DocumentLoaderRegistry.register(
NotionLoader,
(databaseId: string) => new NotionLoader(databaseId),
);
Full ingest pipeline
Putting it all together with the RAG pipeline:
import {
DirectoryLoader,
GitHubLoader,
WebLoader,
RAGPipeline,
OpenAIEmbeddings,
MemoryVectorStore,
RecursiveTextSplitter,
} from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({ apiKey: process.env.OPENAI_API_KEY });
const vectorStore = new MemoryVectorStore(embeddings);
const splitter = new RecursiveTextSplitter({ chunkSize: 800, chunkOverlap: 150 });
const pipeline = new RAGPipeline({ vectorStore, embeddingProvider: embeddings, textSplitter: splitter });
await pipeline.initialize();
// Load from multiple sources
const [localDocs, githubDocs, webDocs] = await Promise.all([
new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load(),
new GitHubLoader({ owner: 'hazeljs', repo: 'hazel', directory: 'docs', extensions: ['.md'] }).load(),
new WebLoader({ urls: ['https://hazeljs.com/docs'] }).load(),
]);
// Index everything at once
const ids = await pipeline.addDocuments([...localDocs, ...githubDocs, ...webDocs]);
console.log(`Indexed ${ids.length} chunks`);
GraphRAG
GraphRAG extends traditional vector search by building a knowledge graph of entities and relationships extracted from your documents. Instead of searching raw text chunks by cosine similarity, it retrieves structured facts and cross-document themes — answering questions that flat vector search cannot.
See the full GraphRAG Guide for an in-depth walkthrough.
Why GraphRAG?
Traditional RAG retrieves the K most similar text chunks. This works well for narrow questions but fails for:
- Cross-document reasoning — "How do all the components in the system relate to each other?"
- Thematic questions — "What are the main architectural layers of this codebase?"
- Entity-relationship queries — "What does the AgentGraph depend on?"
GraphRAG solves this with two complementary retrieval modes:
| Mode | How it works | Best for |
|---|---|---|
| Local | Finds entities matching the query, traverses K hops in the knowledge graph, assembles entity + relationship context | Specific "what is / how does" questions |
| Global | Ranks LLM-generated community reports by relevance; assembles thematic summaries | Broad "what are the main themes / architecture" questions |
| Hybrid | Runs both in parallel, merges contexts, single LLM synthesis call | Best default — covers both dimensions |
Architecture
graph TD
A["Documents"] --> B["Text Chunks"]
B --> C["Entity Extractor<br/>(LLM)"]
C --> D["Knowledge Graph<br/>(GraphStore)"]
D --> E["Community Detector<br/>(Label Propagation)"]
E --> F["Community Summarizer<br/>(LLM Reports)"]
G["User Query"] --> H{"Search Mode"}
H -->|"local"| I["Seed Entity Lookup"]
H -->|"global"| J["Community Report Ranking"]
H -->|"hybrid"| K["Both in Parallel"]
I --> L["BFS Graph Traversal<br/>(K hops)"]
L --> M["Entity + Relationship Context"]
J --> N["Top-K Report Summaries"]
K --> O["Merged Context"]
M --> P["LLM Synthesis"]
N --> P
O --> P
P --> Q["Answer + Sources"]
style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
style F fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
style Q fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fffBuilding the knowledge graph
import OpenAI from 'openai';
import {
GraphRAGPipeline,
DirectoryLoader,
} from '@hazeljs/rag';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Create the pipeline — provide an LLM function for extraction and synthesis
const graphRag = new GraphRAGPipeline({
llm: async (prompt) => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
temperature: 0,
messages: [{ role: 'user', content: prompt }],
});
return res.choices[0].message.content ?? '';
},
extractionChunkSize: 2000, // max chars per LLM extraction call
generateCommunityReports: true, // produce LLM summaries per community cluster
maxCommunitySize: 15, // split communities larger than this
localSearchDepth: 2, // BFS hops for local search
localSearchTopK: 5, // seed entities per query
globalSearchTopK: 5, // community reports used in global search
});
// Load documents from any source
const docs = await new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load();
// Build extracts entities, builds the graph, detects communities, and writes reports
const stats = await graphRag.build(docs);
console.log(stats);
// {
// documentsProcessed: 12,
// entitiesExtracted: 47,
// relationshipsExtracted: 63,
// communitiesDetected: 8,
// communityReportsGenerated: 8,
// duration: 18400,
// }
Local search — entity-centric
Best for specific, factual questions about named concepts, technologies, or processes:
const result = await graphRag.search('How does HazelJS dependency injection work?', {
mode: 'local',
depth: 2, // traverse up to 2 hops from seed entities
topK: 5, // start from 5 seed entities
});
console.log(result.answer);
// "HazelJS uses constructor injection. When the IoC container resolves
// a @Service(), it reads TypeScript metadata to identify constructor
// parameters and injects resolved instances automatically..."
console.log(result.entities.map(e => `${e.name} [${e.type}]`));
// ['Dependency Injection [CONCEPT]', 'IoC Container [TECHNOLOGY]',
// '@Service [FEATURE]', 'HazelJS [TECHNOLOGY]', ...]
console.log(result.relationships.map(r => `${r.type}: ${r.description}`));
// ['USES: HazelJS uses constructor injection pattern', ...]
Global search — community reports
Best for broad questions about themes, architecture, or the overall scope of a knowledge base:
const result = await graphRag.search(
'What are the main architectural layers of the HazelJS framework?',
{
mode: 'global',
topK: 5, // include top 5 community reports by relevance
},
);
console.log(result.communities[0]);
// {
// communityId: 'community_0',
// title: 'HazelJS Core Infrastructure Layer',
// summary: 'This community represents the foundational layer of HazelJS...',
// findings: ['HazelJS Core provides HTTP and DI foundation', ...],
// rating: 9,
// }
Hybrid search — best default
Runs local and global in parallel and merges their contexts before a single LLM synthesis call:
const result = await graphRag.search(
'What vector stores does @hazeljs/rag support and how do I swap them?',
{
mode: 'hybrid', // default when mode is omitted
includeGraph: true, // include entities + relationships in result
includeCommunities: true,
},
);
console.log(`${result.mode} search in ${result.duration}ms`);
console.log(`Entities found: ${result.entities.length}`);
console.log(`Communities used: ${result.communities.length}`);
Entity and relationship types
The LLM extractor maps every concept to one of these canonical types, making the graph consistent and queryable:
Entity types: CONCEPT · TECHNOLOGY · PERSON · ORGANIZATION · PROCESS · FEATURE · EVENT · LOCATION · OTHER
Relationship types: USES · IMPLEMENTS · CREATED_BY · PART_OF · DEPENDS_ON · RELATED_TO · EXTENDS · CONFIGURES · TRIGGERS · PRODUCES · REPLACES · OTHER
Incremental updates
Add new documents to an existing graph without rebuilding from scratch:
// Add a new batch of documents to the existing graph
const updateStats = await graphRag.addDocuments(newDocs);
// Graph re-runs community detection and regenerates reports after each batch
Inspect the graph
The full knowledge graph is available for visualisation (D3.js, Cytoscape.js, etc.):
const graph = graphRag.getGraph();
// Entities
console.log([...graph.entities.values()].slice(0, 3));
// [{ id, name, type, description, sourceDocIds }, ...]
// Relationships
console.log([...graph.relationships.values()].slice(0, 3));
// [{ id, sourceId, targetId, type, description, weight }, ...]
// Community reports
console.log([...graph.communityReports.values()].map(r => r.title));
// ['HazelJS Core DI System', 'RAG Pipeline & Vector Stores', ...]
// Statistics
const stats = graphRag.getStats();
console.log(stats.entityTypeBreakdown);
// { TECHNOLOGY: 14, CONCEPT: 12, FEATURE: 9, PROCESS: 7, ... }
console.log(stats.topEntities.slice(0, 3));
// [{ name: 'HazelJS', connections: 12 }, ...]
GraphRAG vs traditional RAG
| Traditional RAG | GraphRAG | |
|---|---|---|
| Storage | Flat vector index | Knowledge graph + vector index |
| Retrieval unit | Text chunk | Entity + relationships + community |
| Cross-document reasoning | Limited | Native |
| Broad thematic questions | Poor | Excellent (community reports) |
| Specific entity questions | Good | Excellent (BFS traversal) |
| Setup cost | Low | Medium (LLM extraction pass) |
| Token cost per query | Low | Medium |
| Best use case | Q&A over focused docs | Multi-document knowledge bases |
Vector Stores
The RAG package supports 5 vector store implementations with a unified interface.
Memory Vector Store (Development)
In-memory storage with no external dependencies. Perfect for development and testing.
Advantages:
- Zero setup required
- Extremely fast
- No external dependencies
- Great for testing and CI/CD
Limitations:
- Data lost on restart
- Limited to available memory
- Not suitable for production
import { MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
// Use it
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });
Pinecone Vector Store (Production, Serverless)
Fully managed, serverless vector database with automatic scaling.
Advantages:
- Fully managed (no infrastructure)
- Auto-scaling
- Global distribution
- High performance
- Excellent for serverless deployments
Limitations:
- Paid service (free tier available)
- Network latency for self-hosted alternatives
import { PineconeVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new PineconeVectorStore(embeddings, {
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENVIRONMENT,
indexName: 'my-knowledge-base',
});
await vectorStore.initialize();
// Same API as Memory store
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });
Setup:
- Sign up at pinecone.io
- Create an index with dimension matching your embeddings (1536 for OpenAI text-embedding-3-small)
- Get your API key and environment from the dashboard
Qdrant Vector Store (High-Performance, Self-Hosted)
Rust-based vector database optimized for speed and efficiency.
Advantages:
- Extremely fast (Rust-based)
- Advanced filtering capabilities
- Self-hosted (full control)
- Open-source
- Cost-effective for large datasets
Limitations:
- Requires infrastructure management
- Setup complexity
import { QdrantVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new QdrantVectorStore(embeddings, {
url: process.env.QDRANT_URL || 'http://localhost:6333',
collectionName: 'my-knowledge-base',
});
await vectorStore.initialize();
Setup with Docker:
docker run -p 6333:6333 qdrant/qdrant
Weaviate Vector Store (GraphQL, Flexible)
Open-source vector database with GraphQL API and advanced features.
Advantages:
- GraphQL API
- Flexible schema
- Built-in vectorization
- Hybrid search support
- Multi-tenancy
Limitations:
- Requires infrastructure
- Learning curve for GraphQL
import { WeaviateVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new WeaviateVectorStore(embeddings, {
host: process.env.WEAVIATE_HOST || 'http://localhost:8080',
className: 'MyKnowledgeBase',
});
await vectorStore.initialize();
Setup with Docker:
docker run -p 8080:8080 semitechnologies/weaviate:latest
ChromaDB Vector Store (Prototyping, Embedded)
Lightweight, embeddable vector database perfect for prototyping.
Advantages:
- Easy setup
- Lightweight
- Can run embedded or as a server
- Great for prototyping
- Python and JavaScript support
Limitations:
- Less mature than alternatives
- Limited scalability for very large datasets
import { ChromaVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new ChromaVectorStore(embeddings, {
url: process.env.CHROMA_URL || 'http://localhost:8000',
collectionName: 'my-knowledge-base',
});
await vectorStore.initialize();
// ChromaDB-specific features
const stats = await vectorStore.getStats();
console.log('Collection size:', stats.count);
const preview = await vectorStore.peek(5);
console.log('First 5 documents:', preview);
Setup with Docker:
docker run -p 8000:8000 chromadb/chroma
Vector Store Comparison
| Feature | Memory | Pinecone | Qdrant | Weaviate | ChromaDB |
|---|---|---|---|---|---|
| Setup | None | API Key | Docker | Docker | Docker |
| Persistence | ❌ | ✅ | ✅ | ✅ | ✅ |
| Scalability | Low | High | High | High | Medium |
| Performance | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Cost | Free | Paid | Free (OSS) | Free (OSS) | Free (OSS) |
| Best For | Dev/Test | Production | High-perf | GraphQL | Prototyping |
| Metadata Filtering | ✅ | ✅ | ✅ | ✅ | ✅ |
| Hybrid Search | ❌ | ✅ | ✅ | ✅ | ❌ |
| Multi-tenancy | ❌ | ✅ | ✅ | ✅ | ❌ |
Embedding Providers
Embedding providers convert text into vector representations for semantic search.
OpenAI Embeddings
State-of-the-art embeddings from OpenAI with multiple model options.
Models:
text-embedding-3-small: 1536 dimensions, fast and cost-effectivetext-embedding-3-large: 3072 dimensions, highest qualitytext-embedding-ada-002: Legacy model, 1536 dimensions
import { OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536, // Optional: reduce dimensions for faster search
});
// Embed single text
const vector = await embeddings.embed('Hello world');
console.log('Vector dimensions:', vector.length);
// Embed multiple texts (batch)
const vectors = await embeddings.embedBatch([
'First document',
'Second document',
'Third document',
]);
Cohere Embeddings
Multilingual embeddings from Cohere with excellent performance.
Models:
embed-english-v3.0: English-only, high qualityembed-multilingual-v3.0: 100+ languagesembed-english-light-v3.0: Faster, smaller model
import { CohereEmbeddings } from '@hazeljs/rag';
const embeddings = new CohereEmbeddings({
apiKey: process.env.COHERE_API_KEY,
model: 'embed-english-v3.0',
inputType: 'search_document', // or 'search_query'
});
const vector = await embeddings.embed('Hello world');
Retrieval Strategies
Advanced search strategies for better results.
Hybrid Search
Combines vector similarity search with BM25 keyword search for best results.
graph LR A["Query"] --> B["Vector Search<br/>(Semantic)"] A --> C["BM25 Search<br/>(Keyword)"] B --> D["Score Fusion"] C --> D D --> E["Ranked Results"] style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff style E fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
How it works:
- Performs vector similarity search (semantic understanding)
- Performs BM25 keyword search (exact term matching)
- Normalizes scores from both methods
- Combines scores with configurable weights
- Returns re-ranked results
import {
HybridSearchRetrieval,
MemoryVectorStore,
OpenAIEmbeddings
} from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
const hybridSearch = new HybridSearchRetrieval(vectorStore, {
vectorWeight: 0.7, // 70% weight to semantic search
keywordWeight: 0.3, // 30% weight to keyword search
topK: 10,
});
// Add documents
await vectorStore.addDocuments(documents);
// Search with hybrid strategy
const results = await hybridSearch.search('machine learning algorithms', {
topK: 5,
});
Multi-Query Retrieval
Generates multiple query variations using an LLM to improve recall.
graph TD A["Original Query"] --> B["LLM Query Generator"] B --> C["Query Variation 1"] B --> D["Query Variation 2"] B --> E["Query Variation 3"] C --> F["Vector Search"] D --> F E --> F F --> G["Deduplicate & Rank"] G --> H["Final Results"] style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style D fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style E fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff style G fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff style H fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
How it works:
- Takes user's original question
- Uses LLM to generate multiple variations
- Searches with each variation
- Deduplicates results
- Re-ranks by frequency and average score
import {
MultiQueryRetrieval,
MemoryVectorStore,
OpenAIEmbeddings
} from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
const multiQuery = new MultiQueryRetrieval(vectorStore, {
llmApiKey: process.env.OPENAI_API_KEY,
numQueries: 3, // Generate 3 query variations
topK: 10,
});
// Add documents
await vectorStore.addDocuments(documents);
// Search with multiple query variations
const results = await multiQuery.search('How do I deploy my app?', {
topK: 5,
});
Text Splitters
Intelligent document chunking for optimal retrieval.
Recursive Character Text Splitter
Splits text recursively by trying different separators (paragraphs, sentences, words).
import { RecursiveCharacterTextSplitter } from '@hazeljs/rag';
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000, // Target chunk size in characters
chunkOverlap: 200, // Overlap between chunks for context
separators: ['\n\n', '\n', '. ', ' '], // Try these in order
});
const chunks = await splitter.splitText(longDocument);
console.log(`Split into ${chunks.length} chunks`);
chunks.forEach((chunk, i) => {
console.log(`Chunk ${i + 1}: ${chunk.substring(0, 50)}...`);
});
Character Text Splitter
Simple character-based splitting with overlap.
import { CharacterTextSplitter } from '@hazeljs/rag';
const splitter = new CharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 50,
separator: '\n',
});
const chunks = await splitter.splitText(document);
Token Text Splitter
Splits by token count (useful for LLM context windows).
import { TokenTextSplitter } from '@hazeljs/rag';
const splitter = new TokenTextSplitter({
chunkSize: 512, // Max tokens per chunk
chunkOverlap: 50, // Overlap in tokens
encodingName: 'cl100k_base', // OpenAI encoding
});
const chunks = await splitter.splitText(document);
Decorators
Declarative RAG with decorators.
@Embeddable
Mark a class as embeddable for automatic vector storage.
import { Embeddable, Embedded } from '@hazeljs/rag';
@Embeddable({
vectorStore: 'memory',
embeddingProvider: 'openai',
})
class Article {
@Embedded()
title: string;
@Embedded()
content: string;
metadata: {
author: string;
date: Date;
};
}
@SemanticSearch
Add semantic search to a method.
import { Controller, Get } from '@hazeljs/common';
import { SemanticSearch } from '@hazeljs/rag';
@Controller('search')
class SearchController {
@Get()
@SemanticSearch({
vectorStore: 'pinecone',
topK: 5,
})
async search(@Query('q') query: string) {
// Results automatically injected
return { query, results: this.searchResults };
}
}
@HybridSearch
Add hybrid search (vector + keyword) to a method.
import { Controller, Get } from '@hazeljs/common';
import { HybridSearch } from '@hazeljs/rag';
@Controller('search')
class SearchController {
@Get('hybrid')
@HybridSearch({
vectorStore: 'qdrant',
vectorWeight: 0.7,
keywordWeight: 0.3,
topK: 10,
})
async hybridSearch(@Query('q') query: string) {
return { query, results: this.searchResults };
}
}
Best Practices
Choose the Right Vector Store
- Development: Use
MemoryVectorStorefor fast iteration - Production (Serverless): Use
PineconeVectorStorefor zero infrastructure - Production (Self-Hosted): Use
QdrantVectorStorefor performance and cost - Prototyping: Use
ChromaVectorStorefor quick setup
Optimize Chunk Size
// For Q&A: Smaller chunks (200-500 chars)
const qaChunks = new RecursiveCharacterTextSplitter({
chunkSize: 300,
chunkOverlap: 50,
});
// For summarization: Larger chunks (1000-2000 chars)
const summaryChunks = new RecursiveCharacterTextSplitter({
chunkSize: 1500,
chunkOverlap: 200,
});
Use Metadata Filtering
// Add metadata when indexing
await vectorStore.addDocuments([
{
content: 'Document content',
metadata: {
category: 'technical',
date: '2024-01-01',
author: 'John Doe',
},
},
]);
// Filter during search
const results = await vectorStore.search('query', {
topK: 5,
filter: {
category: 'technical',
date: { $gte: '2024-01-01' },
},
});
Implement Caching
import { CacheService } from '@hazeljs/cache';
class RAGService {
constructor(
private vectorStore: VectorStore,
private cache: CacheService,
) {}
async search(query: string) {
const cacheKey = `search:${query}`;
// Check cache first
const cached = await this.cache.get(cacheKey);
if (cached) return cached;
// Perform search
const results = await this.vectorStore.search(query);
// Cache results
await this.cache.set(cacheKey, results, 3600); // 1 hour
return results;
}
}
Monitor Performance
async function searchWithMetrics(query: string) {
const start = Date.now();
try {
const results = await vectorStore.search(query);
const duration = Date.now() - start;
console.log(`Search completed in ${duration}ms`);
console.log(`Found ${results.length} results`);
return results;
} catch (error) {
console.error('Search failed:', error);
throw error;
}
}
Troubleshooting
Connection Errors
// Add retry logic
async function connectWithRetry(vectorStore: VectorStore, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
await vectorStore.initialize();
console.log('Connected successfully');
return;
} catch (error) {
console.log(`Connection attempt ${i + 1} failed`);
if (i === maxRetries - 1) throw error;
await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
}
}
}
Dimension Mismatch
// Ensure embedding dimensions match vector store configuration
// OpenAI text-embedding-3-small = 1536 dimensions
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536, // Must match index
});
Docker Setup for Self-Hosted Stores
# Qdrant
docker run -p 6333:6333 qdrant/qdrant
# Weaviate
docker run -p 8080:8080 semitechnologies/weaviate:latest
# ChromaDB
docker run -p 8000:8000 chromadb/chroma
Low Search Quality
- Increase chunk overlap: More context between chunks
- Adjust chunk size: Smaller chunks for precise retrieval
- Use hybrid search: Combine semantic and keyword search
- Add metadata filtering: Narrow down search scope
- Try multi-query retrieval: Generate multiple search variations
High Latency
- Use batch operations: Process multiple documents at once
- Cache embeddings: Store embeddings with documents
- Optimize topK: Request fewer results
- Use production vector stores: Pinecone, Qdrant, or Weaviate
- Enable connection pooling: For self-hosted databases
Memory System
The RAG package includes a powerful memory system for building context-aware AI applications. See the Memory System Guide for complete documentation.
Quick Example
import {
RAGPipelineWithMemory,
MemoryManager,
HybridMemory,
BufferMemory,
VectorMemory,
} from '@hazeljs/rag';
// Setup memory
const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);
const memoryManager = new MemoryManager(hybridMemory, {
maxConversationLength: 20,
summarizeAfter: 50,
entityExtraction: true,
});
// Create RAG with memory
const rag = new RAGPipelineWithMemory(
{ vectorStore, embeddingProvider: embeddings },
memoryManager,
llmFunction
);
// Query with conversation context
const response = await rag.queryWithMemory(
'What did we discuss about pricing?',
'session-123',
'user-456'
);
console.log(response.answer);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);
Memory Features
- Conversation Memory: Track multi-turn conversations with auto-summarization
- Entity Memory: Remember people, companies, and relationships
- Fact Storage: Store and recall facts semantically
- Working Memory: Temporary context for current tasks
- Hybrid Storage: Fast buffer + persistent vector storage
- Semantic Search: Find relevant memories using embeddings
Learn more in the Memory System Guide.
What's Next?
- Read the Document Loaders Guide for deep dives on every loader
- Explore the GraphRAG Guide for knowledge graph retrieval
- Explore the Memory System for context-aware AI
- Learn about AI Package for LLM integration with RAG
- Explore Caching to optimize RAG performance
- Check out Config for managing API keys
- Read the Vector Stores Guide for detailed setup
- See RAG Patterns for advanced techniques
- Compare RAG vs Agentic RAG to choose the right approach
- Explore Agentic RAG for autonomous retrieval strategies
API Reference
For complete API documentation, see the RAG API Reference.