HazelJS RAG Package

@hazeljs/rag provides Retrieval-Augmented Generation for HazelJS applications with document loaders, knowledge graph retrieval (GraphRAG), memory management, vector stores, and semantic search.

Quick Reference

Purpose: @hazeljs/rag provides document loading, chunking, embedding, vector storage, semantic search, hybrid search, GraphRAG, and memory management for building RAG applications in HazelJS.
When to use: Use @hazeljs/rag when a HazelJS application needs to retrieve relevant documents before LLM generation, build a knowledge base, or maintain conversation memory with semantic retrieval. Use @hazeljs/ai alone for simple LLM calls without document retrieval.
Key concepts: RAGPipeline, Reranker, document loaders (11 built-in), chunking strategies, OpenAIEmbeddings/CohereEmbeddings, vector stores (Memory, Pinecone, Qdrant, Weaviate, ChromaDB), @SemanticSearch decorator, @HybridSearch decorator, GraphRAG (knowledge graph), MemoryManager, RagModule.
Inputs: Documents (text, PDF, web, etc.), user queries, embedding configuration, vector store configuration.
Outputs: Retrieved document chunks ranked by relevance, context-grounded LLM responses, knowledge graph entities and relationships.
Dependencies: @hazeljs/core, @hazeljs/ai (for embeddings and LLM generation), a vector store provider.
Common patterns: Load documents → chunk → embed → index in vector store → query with @SemanticSearch or RAGPipeline → pass retrieved context to LLM → generate response.
Common mistakes: Not chunking documents before indexing (large documents produce poor embeddings); using in-memory vector store in production (not persistent); not adding metadata to documents; setting topK too high (noise) or too low (missing context).

Purpose

Building RAG applications requires integrating vector databases, managing embeddings, loading documents from diverse sources, implementing search strategies, and maintaining conversation context. The @hazeljs/rag package solves all of this in one place:

11 Document Loaders: TXT, Markdown, JSON, CSV, HTML, PDF, DOCX, web scraping, YouTube transcripts, GitHub repos, and inline text — all with a unified BaseDocumentLoader API
GraphRAG: Knowledge graph-based retrieval that extracts entities and relationships, detects communities, and enables entity-centric (local) and thematic (global) search that outperforms flat cosine similarity
5 Vector Store Implementations: Memory, Pinecone, Qdrant, Weaviate, and ChromaDB with a unified interface
Memory System: Conversation tracking, entity memory, fact storage, and working memory for context-aware AI
Multiple Embedding Providers: OpenAI and Cohere embeddings with easy extensibility
Advanced Retrieval Strategies: Hybrid search (vector + BM25), multi-query retrieval, and semantic search
Intelligent Text Splitting: Multiple chunking strategies for optimal retrieval
RAG + Memory Integration: Combine document retrieval with conversation history for enhanced context
Decorator-Based API: @Embeddable, @SemanticSearch, @HybridSearch for declarative RAG
Production-Ready: Battle-tested patterns with proper error handling and TypeScript support

Architecture

graph TD
  A["Documents"] --> B["Text Splitter"]
  B --> C["Chunks"]
  C --> D["Embedding Provider<br/>(OpenAI, Cohere)"]
  D --> E["Vector Embeddings"]
  E --> F["Vector Store<br/>(Memory, Pinecone, Qdrant, etc.)"]
  G["User Query"] --> H["Embedding Provider"]
  H --> I["Query Vector"]
  I --> J["Retrieval Strategy<br/>(Semantic, Hybrid, Multi-Query)"]
  J --> F
  F --> K["Initial Results"]
  K --> L["Reranker<br/>(Cohere Rerank 3)"]
  L --> M["Ranked Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style C fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style D fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style E fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style F fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff

Key Components

RAG Pipeline: Orchestrates document indexing, query processing, and result retrieval
Vector Stores: Pluggable storage backends for embeddings and documents
Embedding Providers: Generate vector embeddings from text
Retrieval Strategies: Advanced search algorithms (hybrid, multi-query, BM25)
Rerankers: Pluggable re-ordering of search results for high-precision retrieval (Cohere)
Text Splitters: Intelligent document chunking for optimal retrieval
Decorators: @Embeddable, @SemanticSearch, @HybridSearch for declarative RAG

Advantages

Vector Store Flexibility

Start with in-memory storage for development, then seamlessly switch to Pinecone, Qdrant, Weaviate, or ChromaDB for production—all with the same API.

Advanced Retrieval

Built-in support for hybrid search (combining vector and keyword search), multi-query retrieval (generating multiple search queries), and BM25 keyword ranking.

Semantic Reranking

High-precision retrieval with built-in support for Cohere Rerank 3. Vector search finds relevant documents, but Rerankers identify the exact needles in the haystack to virtually eliminate hallucinations.

Developer Experience

Decorator-based API means you can add RAG capabilities with a single decorator. No need to manage vector stores, embeddings, or search logic manually.

Production Ready

Proper error handling, TypeScript support, connection pooling, and battle-tested patterns make it ready for production use.

Extensible

Easy to add custom vector stores, embedding providers, or retrieval strategies by implementing simple interfaces.

Installation

# Core RAG package
npm install @hazeljs/rag

# Peer dependencies (choose based on your needs)
npm install openai  # For OpenAI embeddings and GraphRAG LLM

# Optional: Vector store clients (install only what you need)
npm install @pinecone-database/pinecone  # For Pinecone
npm install @qdrant/js-client-rest       # For Qdrant
npm install weaviate-ts-client           # For Weaviate
npm install chromadb                     # For ChromaDB

Optional Document Loader Dependencies:

# For Cohere embeddings
npm install cohere-ai

# For PDF loading (PdfLoader)
npm install pdf-parse

# For Word document loading (DocxLoader)
npm install mammoth

# For CSS-selector web scraping (WebLoader / HtmlFileLoader)
npm install cheerio

Quick Start

Fastest Way: RAGPipeline.from()

The easiest way to get started with RAG using the new factory method:

import { RAGPipeline } from '@hazeljs/rag';

// One-liner setup with sensible defaults
const pipeline = RAGPipeline.from({
  provider: 'openai',  // or 'cohere'
  apiKey: process.env.OPENAI_API_KEY,  // Falls back to env var
  topK: 5,
  llm: async (prompt) => {
    // Your LLM function for answer generation
    const response = await fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'gpt-4',
        messages: [{ role: 'user', content: prompt }],
      }),
    });
    const data = await response.json();
    return data.choices[0].message.content;
  },
  reranker: 'cohere', // NEW: Cohere Rerank 3 for high-precision retrieval
});

// Initialize and use
await pipeline.initialize();

// Universal document ingestion - auto-detects file type
await pipeline.addDocuments([
  { content: 'HazelJS is a TypeScript framework...', metadata: { source: 'docs' } },
]);

// Query the knowledge base
const result = await pipeline.query('What is HazelJS?');
console.log(result.answer);

Universal Document Ingestion

The RAGService provides a universal ingest() method that auto-detects file types:

import { RAGService } from '@hazeljs/rag';

const rag = new RAGService({
  vectorStore,
  embeddingProvider,
  llmFunction,
});

// Auto-detects and loads any supported format
await rag.ingest('./docs/guide.pdf');           // PDF
await rag.ingest('./data/faq.csv');             // CSV
await rag.ingest('https://example.com/page');   // Web page
await rag.ingest('./knowledge-base/');          // Entire directory

// Then query
const { answer, sources } = await rag.ask('What is the pricing?');

Manual Setup (Advanced)

For more control, set up the pipeline manually:

import { 
  RAGPipeline, 
  OpenAIEmbeddings, 
  MemoryVectorStore 
} from '@hazeljs/rag';

// Setup embeddings provider
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536,
});

// Create vector store
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

// Create RAG pipeline
const rag = new RAGPipeline({
  vectorStore,
  embeddingProvider: embeddings,
  topK: 5, // Return top 5 results
});

await rag.initialize();

// Index documents
await rag.addDocuments([
  {
    content: 'HazelJS is a modern TypeScript framework for building scalable applications.',
    metadata: { category: 'framework', source: 'docs' },
  },
  {
    content: 'The RAG package provides semantic search and vector database integration.',
    metadata: { category: 'rag', source: 'docs' },
  },
]);

// Query with semantic search
const results = await rag.search('What is HazelJS?', { topK: 3 });

console.log('Search Results:');
results.forEach((result, index) => {
  console.log(`${index + 1}. ${result.content}`);
  console.log(`   Score: ${result.score}`);
  console.log(`   Metadata:`, result.metadata);
});

Document Loaders

Document loaders are the entry point of every RAG pipeline. They read data from any source and return a standardised Document[] array ready for chunking and indexing. Every real-world application needs them immediately — @hazeljs/rag ships 11 built-in loaders covering every common source.

Loader overview

Loader	Source	Extra install?
`TextFileLoader`	`.txt` files	—
`MarkdownFileLoader`	`.md` / `.mdx` with heading splits and YAML front-matter	—
`JSONFileLoader`	`.json` arrays or objects with `textKey` / `jsonPointer` extraction	—
`CSVFileLoader`	`.csv` rows mapped to documents with configurable columns	—
`HtmlFileLoader`	`.html` tag stripping; CSS selectors via `cheerio`	optional `cheerio`
`DirectoryLoader`	Recursive directory walk, auto-detects loader by extension	—
`PdfLoader`	PDFs via `pdf-parse`; split by page or as one document	`npm i pdf-parse`
`DocxLoader`	Word documents via `mammoth`; plain text or HTML output	`npm i mammoth`
`WebLoader`	HTTP page scraping; CSS selectors via `cheerio`; retry/timeout	optional `cheerio`
`YouTubeTranscriptLoader`	YouTube transcript download (no API key); segment by duration	—
`GitHubLoader`	GitHub REST API; filter by directory, extension, `maxFiles`	—

File loaders

import {
  TextFileLoader,
  MarkdownFileLoader,
  JSONFileLoader,
  CSVFileLoader,
  HtmlFileLoader,
} from '@hazeljs/rag';

// Plain text — one document per file
const textDocs = await new TextFileLoader({
  filePath: './docs/notes.txt',
}).load();

// Markdown — split into one document per heading section
const mdDocs = await new MarkdownFileLoader({
  filePath: './docs/guide.md',
  splitByHeading: true,        // creates one Document per H2/H3 section
  parseYamlFrontMatter: true,  // front-matter fields become metadata
}).load();
// mdDocs[0].metadata.heading === 'Installation'

// JSON — extract a specific field as the document content
const jsonDocs = await new JSONFileLoader({
  filePath: './data/articles.json',
  textKey: 'body',             // use 'body' field as content
  // jsonPointer: '/items',    // navigate nested JSON with a JSON Pointer
}).load();

// CSV — map rows to documents; choose which columns become content vs metadata
const csvDocs = await new CSVFileLoader({
  filePath: './data/faqs.csv',
  contentColumns: ['question', 'answer'],
  metadataColumns: ['category'],
}).load();

// HTML — strips all tags, extracts title
const htmlDocs = await new HtmlFileLoader({
  filePath: './docs/index.html',
  selector: 'main',            // optional: only extract content inside <main>
}).load();

DirectoryLoader — bulk ingest

DirectoryLoader walks a directory recursively and automatically delegates each file to the right typed loader. This is the fastest way to ingest a knowledge base from disk:

import { DirectoryLoader } from '@hazeljs/rag';

const docs = await new DirectoryLoader({
  dirPath: './knowledge-base',
  recursive: true,
  // extensions: ['.md', '.txt'],   // filter to specific types
  // exclude: ['**/node_modules/**'],
}).load();

console.log(`Loaded ${docs.length} documents from ${[...new Set(docs.map(d => d.metadata?.source))].length} files`);

PDF and Word documents

import { PdfLoader, DocxLoader } from '@hazeljs/rag';

// PDF — one document per page or the whole file
const pdfDocs = await new PdfLoader({
  filePath: './reports/annual-report.pdf',
  splitByPage: true,   // each page becomes its own Document
}).load();

// Word document
const wordDocs = await new DocxLoader({
  filePath: './contracts/agreement.docx',
  outputFormat: 'text',  // 'text' (default) or 'html'
}).load();

WebLoader — scrape any URL

import { WebLoader } from '@hazeljs/rag';

// Single URL
const docs = await new WebLoader({
  urls: ['https://hazeljs.ai/docs'],
  timeout: 10_000,
  maxRetries: 3,
  // selector: 'article',   // optional: CSS selector (requires cheerio)
}).load();

// Multiple URLs in one call
const batchDocs = await new WebLoader({
  urls: [
    'https://hazeljs.ai/docs/installation',
    'https://hazeljs.ai/blog/graphrag',
  ],
}).load();

YouTubeTranscriptLoader — no API key needed

import { YouTubeTranscriptLoader } from '@hazeljs/rag';

// Works with full URL or just the video ID
const transcriptDocs = await new YouTubeTranscriptLoader({
  videoUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
  segmentDuration: 60,   // group transcript into 60-second chunks
}).load();

// Each doc has metadata: { videoId, startTime, endTime, source }

GitHubLoader — index entire repositories

import { GitHubLoader } from '@hazeljs/rag';

const repoDocs = await new GitHubLoader({
  owner: 'hazeljs',
  repo: 'hazel',
  ref: 'main',                  // branch or tag
  directory: 'docs',            // only load this sub-directory
  extensions: ['.md', '.mdx'],  // only Markdown files
  maxFiles: 100,
  token: process.env.GITHUB_TOKEN, // optional; avoids 60 req/hr rate limit
}).load();

Custom loaders with `@Loader` and `DocumentLoaderRegistry`

Extend BaseDocumentLoader to add any data source. The @Loader decorator registers metadata for auto-detection:

import {
  BaseDocumentLoader,
  Loader,
  DocumentLoaderRegistry,
} from '@hazeljs/rag';

@Loader({
  name: 'NotionLoader',
  description: 'Loads pages from a Notion database',
  extensions: [],
  mimeTypes: ['application/vnd.notion'],
})
export class NotionLoader extends BaseDocumentLoader {
  constructor(private readonly databaseId: string) {
    super();
  }

  async load() {
    const pages = await fetchNotionDatabase(this.databaseId);
    return pages.map((page) =>
      this.createDocument(page.content, {
        source: `notion:${this.databaseId}/${page.id}`,
        title: page.title,
        lastEdited: page.lastEditedTime,
      }),
    );
  }
}

// Register once at startup — then DirectoryLoader and the registry can use it
DocumentLoaderRegistry.register(
  NotionLoader,
  (databaseId: string) => new NotionLoader(databaseId),
);

Full ingest pipeline

Putting it all together with the RAG pipeline:

import {
  DirectoryLoader,
  GitHubLoader,
  WebLoader,
  RAGPipeline,
  OpenAIEmbeddings,
  MemoryVectorStore,
  RecursiveTextSplitter,
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({ apiKey: process.env.OPENAI_API_KEY });
const vectorStore = new MemoryVectorStore(embeddings);
const splitter = new RecursiveTextSplitter({ chunkSize: 800, chunkOverlap: 150 });

const pipeline = new RAGPipeline({ vectorStore, embeddingProvider: embeddings, textSplitter: splitter });
await pipeline.initialize();

// Load from multiple sources
const [localDocs, githubDocs, webDocs] = await Promise.all([
  new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load(),
  new GitHubLoader({ owner: 'hazeljs', repo: 'hazel', directory: 'docs', extensions: ['.md'] }).load(),
  new WebLoader({ urls: ['https://hazeljs.ai/docs'] }).load(),
]);

// Index everything at once
const ids = await pipeline.addDocuments([...localDocs, ...githubDocs, ...webDocs]);
console.log(`Indexed ${ids.length} chunks`);

GraphRAG

GraphRAG extends traditional vector search by building a knowledge graph of entities and relationships extracted from your documents. Instead of searching raw text chunks by cosine similarity, it retrieves structured facts and cross-document themes — answering questions that flat vector search cannot.

See the full GraphRAG Guide for an in-depth walkthrough.

Why GraphRAG?

Traditional RAG retrieves the K most similar text chunks. This works well for narrow questions but fails for:

Cross-document reasoning — "How do all the components in the system relate to each other?"
Thematic questions — "What are the main architectural layers of this codebase?"
Entity-relationship queries — "What does the AgentGraph depend on?"

GraphRAG solves this with two complementary retrieval modes:

Mode	How it works	Best for
Local	Finds entities matching the query, traverses K hops in the knowledge graph, assembles entity + relationship context	Specific "what is / how does" questions
Global	Ranks LLM-generated community reports by relevance; assembles thematic summaries	Broad "what are the main themes / architecture" questions
Hybrid	Runs both in parallel, merges contexts, single LLM synthesis call	Best default — covers both dimensions

Architecture

graph TD
  A["Documents"] --> B["Text Chunks"]
  B --> C["Entity Extractor<br/>(LLM)"]
  C --> D["Knowledge Graph<br/>(GraphStore)"]
  D --> E["Community Detector<br/>(Label Propagation)"]
  E --> F["Community Summarizer<br/>(LLM Reports)"]

  G["User Query"] --> H{"Search Mode"}
  H -->|"local"| I["Seed Entity Lookup"]
  H -->|"global"| J["Community Report Ranking"]
  H -->|"hybrid"| K["Both in Parallel"]

  I --> L["BFS Graph Traversal<br/>(K hops)"]
  L --> M["Entity + Relationship Context"]
  J --> N["Top-K Report Summaries"]
  K --> O["Merged Context"]

  M --> P["LLM Synthesis"]
  N --> P
  O --> P
  P --> Q["Answer + Sources"]

  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style F fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style Q fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

Building the knowledge graph

import OpenAI from 'openai';
import {
  GraphRAGPipeline,
  DirectoryLoader,
} from '@hazeljs/rag';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Create the pipeline — provide an LLM function for extraction and synthesis
const graphRag = new GraphRAGPipeline({
  llm: async (prompt) => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      temperature: 0,
      messages: [{ role: 'user', content: prompt }],
    });
    return res.choices[0].message.content ?? '';
  },
  extractionChunkSize: 2000,      // max chars per LLM extraction call
  generateCommunityReports: true, // produce LLM summaries per community cluster
  maxCommunitySize: 15,           // split communities larger than this
  localSearchDepth: 2,            // BFS hops for local search
  localSearchTopK: 5,             // seed entities per query
  globalSearchTopK: 5,            // community reports used in global search
});

// Load documents from any source
const docs = await new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load();

// Build extracts entities, builds the graph, detects communities, and writes reports
const stats = await graphRag.build(docs);
console.log(stats);
// {
//   documentsProcessed: 12,
//   entitiesExtracted: 47,
//   relationshipsExtracted: 63,
//   communitiesDetected: 8,
//   communityReportsGenerated: 8,
//   duration: 18400,
// }

Local search — entity-centric

Best for specific, factual questions about named concepts, technologies, or processes:

const result = await graphRag.search('How does HazelJS dependency injection work?', {
  mode: 'local',
  depth: 2,   // traverse up to 2 hops from seed entities
  topK: 5,    // start from 5 seed entities
});

console.log(result.answer);
// "HazelJS uses constructor injection. When the IoC container resolves
//  a @Service(), it reads TypeScript metadata to identify constructor
//  parameters and injects resolved instances automatically..."

console.log(result.entities.map(e => `${e.name} [${e.type}]`));
// ['Dependency Injection [CONCEPT]', 'IoC Container [TECHNOLOGY]',
//  '@Service [FEATURE]', 'HazelJS [TECHNOLOGY]', ...]

console.log(result.relationships.map(r => `${r.type}: ${r.description}`));
// ['USES: HazelJS uses constructor injection pattern', ...]

Global search — community reports

Best for broad questions about themes, architecture, or the overall scope of a knowledge base:

const result = await graphRag.search(
  'What are the main architectural layers of the HazelJS framework?',
  {
    mode: 'global',
    topK: 5,  // include top 5 community reports by relevance
  },
);

console.log(result.communities[0]);
// {
//   communityId: 'community_0',
//   title: 'HazelJS Core Infrastructure Layer',
//   summary: 'This community represents the foundational layer of HazelJS...',
//   findings: ['HazelJS Core provides HTTP and DI foundation', ...],
//   rating: 9,
// }

Hybrid search — best default

Runs local and global in parallel and merges their contexts before a single LLM synthesis call:

const result = await graphRag.search(
  'What vector stores does @hazeljs/rag support and how do I swap them?',
  {
    mode: 'hybrid',      // default when mode is omitted
    includeGraph: true,  // include entities + relationships in result
    includeCommunities: true,
  },
);

console.log(`${result.mode} search in ${result.duration}ms`);
console.log(`Entities found: ${result.entities.length}`);
console.log(`Communities used: ${result.communities.length}`);

Entity and relationship types

The LLM extractor maps every concept to one of these canonical types, making the graph consistent and queryable:

Entity types: CONCEPT · TECHNOLOGY · PERSON · ORGANIZATION · PROCESS · FEATURE · EVENT · LOCATION · OTHER

Relationship types: USES · IMPLEMENTS · CREATED_BY · PART_OF · DEPENDS_ON · RELATED_TO · EXTENDS · CONFIGURES · TRIGGERS · PRODUCES · REPLACES · OTHER

Incremental updates

Add new documents to an existing graph without rebuilding from scratch:

// Add a new batch of documents to the existing graph
const updateStats = await graphRag.addDocuments(newDocs);
// Graph re-runs community detection and regenerates reports after each batch

Inspect the graph

The full knowledge graph is available for visualisation (D3.js, Cytoscape.js, etc.):

const graph = graphRag.getGraph();

// Entities
console.log([...graph.entities.values()].slice(0, 3));
// [{ id, name, type, description, sourceDocIds }, ...]

// Relationships
console.log([...graph.relationships.values()].slice(0, 3));
// [{ id, sourceId, targetId, type, description, weight }, ...]

// Community reports
console.log([...graph.communityReports.values()].map(r => r.title));
// ['HazelJS Core DI System', 'RAG Pipeline & Vector Stores', ...]

// Statistics
const stats = graphRag.getStats();
console.log(stats.entityTypeBreakdown);
// { TECHNOLOGY: 14, CONCEPT: 12, FEATURE: 9, PROCESS: 7, ... }
console.log(stats.topEntities.slice(0, 3));
// [{ name: 'HazelJS', connections: 12 }, ...]

GraphRAG vs traditional RAG

	Traditional RAG	GraphRAG
Storage	Flat vector index	Knowledge graph + vector index
Retrieval unit	Text chunk	Entity + relationships + community
Cross-document reasoning	Limited	Native
Broad thematic questions	Poor	Excellent (community reports)
Specific entity questions	Good	Excellent (BFS traversal)
Setup cost	Low	Medium (LLM extraction pass)
Token cost per query	Low	Medium
Best use case	Q&A over focused docs	Multi-document knowledge bases

Vector Stores

The RAG package supports 5 vector store implementations with a unified interface.

Memory Vector Store (Development)

In-memory storage with no external dependencies. Perfect for development and testing.

Advantages:

Zero setup required
Extremely fast
No external dependencies
Great for testing and CI/CD

Limitations:

Data lost on restart
Limited to available memory
Not suitable for production

import { MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

// Use it
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });

Pinecone Vector Store (Production, Serverless)

Fully managed, serverless vector database with automatic scaling.

Advantages:

Fully managed (no infrastructure)
Auto-scaling
Global distribution
High performance
Excellent for serverless deployments

Limitations:

Paid service (free tier available)
Network latency for self-hosted alternatives

import { PineconeVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new PineconeVectorStore(embeddings, {
  apiKey: process.env.PINECONE_API_KEY,
  environment: process.env.PINECONE_ENVIRONMENT,
  indexName: 'my-knowledge-base',
});

await vectorStore.initialize();

// Same API as Memory store
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });

Setup:

Sign up at pinecone.io
Create an index with dimension matching your embeddings (1536 for OpenAI text-embedding-3-small)
Get your API key and environment from the dashboard

Qdrant Vector Store (High-Performance, Self-Hosted)

Rust-based vector database optimized for speed and efficiency.

Advantages:

Extremely fast (Rust-based)
Advanced filtering capabilities
Self-hosted (full control)
Open-source
Cost-effective for large datasets

Limitations:

Requires infrastructure management
Setup complexity

import { QdrantVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new QdrantVectorStore(embeddings, {
  url: process.env.QDRANT_URL || 'http://localhost:6333',
  collectionName: 'my-knowledge-base',
});

await vectorStore.initialize();

Setup with Docker:

docker run -p 6333:6333 qdrant/qdrant

Weaviate Vector Store (GraphQL, Flexible)

Open-source vector database with GraphQL API and advanced features.

Advantages:

GraphQL API
Flexible schema
Built-in vectorization
Hybrid search support
Multi-tenancy

Limitations:

Requires infrastructure
Learning curve for GraphQL

import { WeaviateVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new WeaviateVectorStore(embeddings, {
  host: process.env.WEAVIATE_HOST || 'http://localhost:8080',
  className: 'MyKnowledgeBase',
});

await vectorStore.initialize();

Setup with Docker:

docker run -p 8080:8080 semitechnologies/weaviate:latest

ChromaDB Vector Store (Prototyping, Embedded)

Lightweight, embeddable vector database perfect for prototyping.

Advantages:

Easy setup
Lightweight
Can run embedded or as a server
Great for prototyping
Python and JavaScript support

Limitations:

Less mature than alternatives
Limited scalability for very large datasets

import { ChromaVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new ChromaVectorStore(embeddings, {
  url: process.env.CHROMA_URL || 'http://localhost:8000',
  collectionName: 'my-knowledge-base',
});

await vectorStore.initialize();

// ChromaDB-specific features
const stats = await vectorStore.getStats();
console.log('Collection size:', stats.count);

const preview = await vectorStore.peek(5);
console.log('First 5 documents:', preview);

Setup with Docker:

docker run -p 8000:8000 chromadb/chroma

Vector Store Comparison

Feature	Memory	Pinecone	Qdrant	Weaviate	ChromaDB
Setup	None	API Key	Docker	Docker	Docker
Persistence	❌	✅	✅	✅	✅
Scalability	Low	High	High	High	Medium
Performance	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐
Cost	Free	Paid	Free (OSS)	Free (OSS)	Free (OSS)
Best For	Dev/Test	Production	High-perf	GraphQL	Prototyping
Metadata Filtering	✅	✅	✅	✅	✅
Hybrid Search	❌	✅	✅	✅	❌
Multi-tenancy	❌	✅	✅	✅	❌

Embedding Providers

Embedding providers convert text into vector representations for semantic search.

OpenAI Embeddings

State-of-the-art embeddings from OpenAI with multiple model options.

Models:

text-embedding-3-small: 1536 dimensions, fast and cost-effective
text-embedding-3-large: 3072 dimensions, highest quality
text-embedding-ada-002: Legacy model, 1536 dimensions

import { OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536, // Optional: reduce dimensions for faster search
});

// Embed single text
const vector = await embeddings.embed('Hello world');
console.log('Vector dimensions:', vector.length);

// Embed multiple texts (batch)
const vectors = await embeddings.embedBatch([
  'First document',
  'Second document',
  'Third document',
]);

Cohere Embeddings

Multilingual embeddings from Cohere with excellent performance.

Models:

embed-english-v3.0: English-only, high quality
embed-multilingual-v3.0: 100+ languages
embed-english-light-v3.0: Faster, smaller model

import { CohereEmbeddings } from '@hazeljs/rag';

const embeddings = new CohereEmbeddings({
  apiKey: process.env.COHERE_API_KEY,
  model: 'embed-english-v3.0',
  inputType: 'search_document', // or 'search_query'
});

const vector = await embeddings.embed('Hello world');

Retrieval Strategies

Advanced search strategies for better results.

Hybrid Search

Combines vector similarity search with BM25 keyword search for best results.

graph LR
  A["Query"] --> B["Vector Search<br/>(Semantic)"]
  A --> C["BM25 Search<br/>(Keyword)"]
  B --> D["Score Fusion"]
  C --> D
  D --> E["Ranked Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style E fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

How it works:

Performs vector similarity search (semantic understanding)
Performs BM25 keyword search (exact term matching)
Normalizes scores from both methods
Combines scores with configurable weights
Returns re-ranked results

import { 
  HybridSearchRetrieval, 
  MemoryVectorStore, 
  OpenAIEmbeddings 
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

const hybridSearch = new HybridSearchRetrieval(vectorStore, {
  vectorWeight: 0.7,  // 70% weight to semantic search
  keywordWeight: 0.3, // 30% weight to keyword search
  topK: 10,
});

// Add documents
await vectorStore.addDocuments(documents);

// Search with hybrid strategy
const results = await hybridSearch.search('machine learning algorithms', {
  topK: 5,
});

Multi-Query Retrieval

Generates multiple query variations using an LLM to improve recall.

graph TD
  A["Original Query"] --> B["LLM Query Generator"]
  B --> C["Query Variation 1"]
  B --> D["Query Variation 2"]
  B --> E["Query Variation 3"]
  C --> F["Vector Search"]
  D --> F
  E --> F
  F --> G["Deduplicate & Rank"]
  G --> H["Final Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style D fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style E fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style G fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
  style H fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff

How it works:

Takes user's original question
Uses LLM to generate multiple variations
Searches with each variation
Deduplicates results
Re-ranks by frequency and average score

import { 
  MultiQueryRetrieval, 
  MemoryVectorStore, 
  OpenAIEmbeddings 
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

const multiQuery = new MultiQueryRetrieval(vectorStore, {
  llmApiKey: process.env.OPENAI_API_KEY,
  numQueries: 3, // Generate 3 query variations
  topK: 10,
});

// Add documents
await vectorStore.addDocuments(documents);

// Search with multiple query variations
const results = await multiQuery.search('How do I deploy my app?', {
  topK: 5,
});

Text Splitters

Intelligent document chunking for optimal retrieval.

Recursive Character Text Splitter

Splits text recursively by trying different separators (paragraphs, sentences, words).

import { RecursiveCharacterTextSplitter } from '@hazeljs/rag';

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,      // Target chunk size in characters
  chunkOverlap: 200,    // Overlap between chunks for context
  separators: ['\n\n', '\n', '. ', ' '], // Try these in order
});

const chunks = await splitter.splitText(longDocument);

console.log(`Split into ${chunks.length} chunks`);
chunks.forEach((chunk, i) => {
  console.log(`Chunk ${i + 1}: ${chunk.substring(0, 50)}...`);
});

Character Text Splitter

Simple character-based splitting with overlap.

import { CharacterTextSplitter } from '@hazeljs/rag';

const splitter = new CharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 50,
  separator: '\n',
});

const chunks = await splitter.splitText(document);

Token Text Splitter

Splits by token count (useful for LLM context windows).

import { TokenTextSplitter } from '@hazeljs/rag';

const splitter = new TokenTextSplitter({
  chunkSize: 512,      // Max tokens per chunk
  chunkOverlap: 50,    // Overlap in tokens
  encodingName: 'cl100k_base', // OpenAI encoding
});

const chunks = await splitter.splitText(document);

Decorators

Declarative RAG with decorators.

@Embeddable

Mark a class as embeddable for automatic vector storage.

import { Embeddable, Embedded } from '@hazeljs/rag';

@Embeddable({
  vectorStore: 'memory',
  embeddingProvider: 'openai',
})
class Article {
  @Embedded()
  title: string;

  @Embedded()
  content: string;

  metadata: {
    author: string;
    date: Date;
  };
}

@SemanticSearch

Add semantic search to a method.

import { Controller, Get } from '@hazeljs/common';
import { SemanticSearch } from '@hazeljs/rag';

@Controller('search')
class SearchController {
  @Get()
  @SemanticSearch({
    vectorStore: 'pinecone',
    topK: 5,
  })
  async search(@Query('q') query: string) {
    // Results automatically injected
    return { query, results: this.searchResults };
  }
}

@HybridSearch

Add hybrid search (vector + keyword) to a method.

import { Controller, Get } from '@hazeljs/common';
import { HybridSearch } from '@hazeljs/rag';

@Controller('search')
class SearchController {
  @Get('hybrid')
  @HybridSearch({
    vectorStore: 'qdrant',
    vectorWeight: 0.7,
    keywordWeight: 0.3,
    topK: 10,
  })
  async hybridSearch(@Query('q') query: string) {
    return { query, results: this.searchResults };
  }
}

Advanced Retrieval Methods

The RAGService provides several advanced retrieval methods beyond basic semantic search:

Multi-Query Retrieval

Generate multiple search queries from a single question to improve recall:

// Generates 3 variations of the query and combines results
const results = await rag.multiQuery('What is HazelJS?', 3);

Conversational RAG

Maintain conversation context across multiple turns:

const sessionId = 'user-123';

// First question
const result1 = await rag.chat('What is HazelJS?', sessionId);

// Follow-up question - automatically uses conversation history
const result2 = await rag.chat('How do I install it?', sessionId);

// Clear conversation when done
rag.clearChat(sessionId);

Hybrid Search

Combine vector similarity with BM25 keyword search:

const results = await rag.hybridSearch('TypeScript framework', {
  topK: 10,
  vectorWeight: 0.7,  // 70% vector similarity
  keywordWeight: 0.3,  // 30% keyword matching
});

Context Compression

Remove redundant and low-relevance results:

const results = await rag.search('query', { topK: 20 });
const compressed = await rag.compress(results, 'query');
// Returns ~5 most relevant, non-redundant results

Reranking

Re-score results using LLM-based cross-encoder scoring:

const results = await rag.search('query', { topK: 20 });
const reranked = await rag.rerank(results, 'query', 5);
// Returns top 5 after LLM reranking

Ensemble Retrieval

Combine multiple retrieval strategies with weighted fusion:

import { RetrievalStrategy } from '@hazeljs/rag';

const results = await rag.ensemble(
  'query',
  [RetrievalStrategy.SIMILARITY, RetrievalStrategy.HYBRID, RetrievalStrategy.MMR],
  [0.5, 0.3, 0.2]  // Weights for each strategy
);

Error Handling

HazelJS provides typed error classes for robust RAG error handling:

import { RAGService, RAGError, RAGErrorCode } from '@hazeljs/rag';

try {
  await rag.ingest('./documents/guide.pdf');
} catch (error) {
  if (error instanceof RAGError) {
    switch (error.code) {
      case RAGErrorCode.MISSING_DEPENDENCY:
        console.error('Install required package:', error.message);
        // Error includes package name and install command
        break;
      case RAGErrorCode.LOADER_ERROR:
        console.error('Failed to load document:', error.message);
        break;
      case RAGErrorCode.EMBEDDING_ERROR:
        console.error('Embedding generation failed:', error.message);
        // Retry or use fallback
        break;
      case RAGErrorCode.VECTOR_STORE_ERROR:
        console.error('Vector store error:', error.message);
        break;
      default:
        console.error('RAG error:', error.message);
    }
  }
  throw error;
}

Available Error Codes

VECTOR_STORE_ERROR - Vector database operation failed
EMBEDDING_ERROR - Embedding generation failed
LOADER_ERROR - Document loading failed
SPLITTER_ERROR - Text splitting failed
LLM_GENERATION_ERROR - Answer generation failed
INDEX_ERROR - Document indexing failed
RETRIEVAL_ERROR - Search/retrieval failed
UNSUPPORTED_FORMAT - File format not supported
MISSING_DEPENDENCY - Required package not installed
CONFIGURATION_ERROR - Invalid configuration

Debugging

Enable debug logging with the HAZELJS_DEBUG environment variable:

HAZELJS_DEBUG=true npm start

Debug output for RAG operations:

2024-03-23T12:00:00.000Z [hazeljs:rag] ingest start source=./docs/guide.pdf
2024-03-23T12:00:01.200Z [hazeljs:rag] ingest loaded docs=15
2024-03-23T12:00:02.500Z [hazeljs:rag] ingest complete ids=15
2024-03-23T12:00:03.000Z [hazeljs:rag] search query=What is HazelJS? strategy=similarity
2024-03-23T12:00:03.300Z [hazeljs:rag] search results=5
2024-03-23T12:00:04.100Z [hazeljs:rag] ask query=What is HazelJS?
2024-03-23T12:00:05.500Z [hazeljs:rag] ask complete answer_len=245 sources=5

Best Practices

Choose the Right Vector Store

Development: Use MemoryVectorStore for fast iteration
Production (Serverless): Use PineconeVectorStore for zero infrastructure
Production (Self-Hosted): Use QdrantVectorStore for performance and cost
Prototyping: Use ChromaVectorStore for quick setup

Optimize Chunk Size

// For Q&A: Smaller chunks (200-500 chars)
const qaChunks = new RecursiveCharacterTextSplitter({
  chunkSize: 300,
  chunkOverlap: 50,
});

// For summarization: Larger chunks (1000-2000 chars)
const summaryChunks = new RecursiveCharacterTextSplitter({
  chunkSize: 1500,
  chunkOverlap: 200,
});

Use Metadata Filtering

// Add metadata when indexing
await vectorStore.addDocuments([
  {
    content: 'Document content',
    metadata: {
      category: 'technical',
      date: '2024-01-01',
      author: 'John Doe',
    },
  },
]);

// Filter during search
const results = await vectorStore.search('query', {
  topK: 5,
  filter: {
    category: 'technical',
    date: { $gte: '2024-01-01' },
  },
});

Implement Caching

import { CacheService } from '@hazeljs/cache';

class RAGService {
  constructor(
    private vectorStore: VectorStore,
    private cache: CacheService,
  ) {}

  async search(query: string) {
    const cacheKey = `search:${query}`;
    
    // Check cache first
    const cached = await this.cache.get(cacheKey);
    if (cached) return cached;

    // Perform search
    const results = await this.vectorStore.search(query);

    // Cache results
    await this.cache.set(cacheKey, results, 3600); // 1 hour

    return results;
  }
}

Monitor Performance

async function searchWithMetrics(query: string) {
  const start = Date.now();
  
  try {
    const results = await vectorStore.search(query);
    const duration = Date.now() - start;
    
    console.log(`Search completed in ${duration}ms`);
    console.log(`Found ${results.length} results`);
    
    return results;
  } catch (error) {
    console.error('Search failed:', error);
    throw error;
  }
}

Troubleshooting

Connection Errors

// Add retry logic
async function connectWithRetry(vectorStore: VectorStore, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await vectorStore.initialize();
      console.log('Connected successfully');
      return;
    } catch (error) {
      console.log(`Connection attempt ${i + 1} failed`);
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}

Dimension Mismatch

// Ensure embedding dimensions match vector store configuration
// OpenAI text-embedding-3-small = 1536 dimensions
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536, // Must match index
});

Docker Setup for Self-Hosted Stores

# Qdrant
docker run -p 6333:6333 qdrant/qdrant

# Weaviate
docker run -p 8080:8080 semitechnologies/weaviate:latest

# ChromaDB
docker run -p 8000:8000 chromadb/chroma

Low Search Quality

Increase chunk overlap: More context between chunks
Adjust chunk size: Smaller chunks for precise retrieval
Use hybrid search: Combine semantic and keyword search
Add metadata filtering: Narrow down search scope
Try multi-query retrieval: Generate multiple search variations

High Latency

Use batch operations: Process multiple documents at once
Cache embeddings: Store embeddings with documents
Optimize topK: Request fewer results
Use production vector stores: Pinecone, Qdrant, or Weaviate
Enable connection pooling: For self-hosted databases

Memory System

The RAG package includes a powerful memory system for building context-aware AI applications. See the Memory System Guide for complete documentation.

Quick Example

import {
  RAGPipelineWithMemory,
  MemoryManager,
  HybridMemory,
  BufferMemory,
  VectorMemory,
} from '@hazeljs/rag';

// Setup memory
const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);

const memoryManager = new MemoryManager(hybridMemory, {
  maxConversationLength: 20,
  summarizeAfter: 50,
  entityExtraction: true,
});

// Create RAG with memory
const rag = new RAGPipelineWithMemory(
  { vectorStore, embeddingProvider: embeddings },
  memoryManager,
  llmFunction
);

// Query with conversation context
const response = await rag.queryWithMemory(
  'What did we discuss about pricing?',
  'session-123',
  'user-456'
);

console.log(response.answer);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);

Memory Features

Conversation Memory: Track multi-turn conversations with auto-summarization
Entity Memory: Remember people, companies, and relationships
Fact Storage: Store and recall facts semantically
Working Memory: Temporary context for current tasks
Hybrid Storage: Fast buffer + persistent vector storage
Semantic Search: Find relevant memories using embeddings

Learn more in the Memory System Guide.

What's Next?

Read the Document Loaders Guide for deep dives on every loader
Explore the GraphRAG Guide for knowledge graph retrieval
Explore the Memory System for context-aware AI
Learn about AI Package for LLM integration with RAG
Explore Caching to optimize RAG performance
Check out Config for managing API keys
Read the Vector Stores Guide for detailed setup
See RAG Patterns for advanced techniques
Compare RAG vs Agentic RAG to choose the right approach
Explore Agentic RAG for autonomous retrieval strategies

AI Package – LLM providers for RAG completion
Eval Package – Precision/recall@k, golden datasets, and CI checks for retrieval quality
Agent Package – Multi-agent RAG with autonomous retrieval
Memory Package – Conversation and entity memory
Prompts Package – Prompt templates for RAG queries
Data Package – Data processing and ETL pipelines
hazeljs-rag-documents-starter – Full RAG example with GraphRAG, multiple loaders, and vector stores

Recipes

Recipe: Ingest Documents and Search

// File: src/knowledge/knowledge.service.ts
import { Service } from '@hazeljs/core';
import { RAGPipeline, FileLoader } from '@hazeljs/rag';

@Service()
export class KnowledgeService {
  constructor(private readonly rag: RAGPipeline) {}

  async ingestFolder(folderPath: string) {
    const loader = new FileLoader({ directory: folderPath, extensions: ['.md', '.txt', '.pdf'] });
    const documents = await loader.load();
    await this.rag.ingest(documents);
    return { ingested: documents.length };
  }

  async search(query: string) {
    const results = await this.rag.search(query, { topK: 5 });
    return results.map(r => ({
      content: r.content,
      score: r.score,
      source: r.metadata.source,
    }));
  }
}

Recipe: RAG-Powered Q&A Endpoint

// File: src/qa/qa.controller.ts
import { Controller, Post, Body } from '@hazeljs/core';
import { RAGPipeline } from '@hazeljs/rag';
import { AIEnhancedService } from '@hazeljs/ai';

@Controller('qa')
export class QAController {
  constructor(
    private readonly rag: RAGPipeline,
    private readonly ai: AIEnhancedService,
  ) {}

  @Post()
  async ask(@Body('question') question: string) {
    const context = await this.rag.search(question, { topK: 5 });
    const contextText = context.map(r => r.content).join('\n\n');

    const answer = await this.ai
      .chat(question)
      .system(`Answer based on the following context:\n\n${contextText}`)
      .model('gpt-4-turbo-preview')
      .text();

    return { answer, sources: context.map(r => r.metadata.source) };
  }
}

Recipe: Hybrid Search with Reranking

// File: src/search/search.service.ts
import { Service } from '@hazeljs/core';
import { RAGPipeline } from '@hazeljs/rag';

@Service()
export class SearchService {
  constructor(private readonly rag: RAGPipeline) {}

  async hybridSearch(query: string) {
    const results = await this.rag.search(query, {
      topK: 10,
      strategy: 'hybrid',
      semanticWeight: 0.7,
      keywordWeight: 0.3,
      rerank: true,
      rerankTopK: 5,
    });

    return results;
  }
}

API Reference

For complete API documentation, see the RAG API Reference.

HazelJS RAG Package

Quick Reference

Purpose

Architecture

Key Components

Advantages

Vector Store Flexibility

Advanced Retrieval

Semantic Reranking

Developer Experience

Production Ready

Extensible

Installation

Quick Start

Fastest Way: RAGPipeline.from()

Universal Document Ingestion

Manual Setup (Advanced)

Document Loaders

Loader overview

File loaders

DirectoryLoader — bulk ingest

PDF and Word documents

WebLoader — scrape any URL

YouTubeTranscriptLoader — no API key needed

GitHubLoader — index entire repositories

Custom loaders with @Loader and DocumentLoaderRegistry

Full ingest pipeline

GraphRAG

Why GraphRAG?

Architecture

Building the knowledge graph

Local search — entity-centric

Global search — community reports

Hybrid search — best default

Entity and relationship types

Incremental updates

Inspect the graph

GraphRAG vs traditional RAG

Vector Stores

Memory Vector Store (Development)

Pinecone Vector Store (Production, Serverless)

Qdrant Vector Store (High-Performance, Self-Hosted)

Weaviate Vector Store (GraphQL, Flexible)

ChromaDB Vector Store (Prototyping, Embedded)

Vector Store Comparison

Embedding Providers

OpenAI Embeddings

Cohere Embeddings

Retrieval Strategies

Hybrid Search

Multi-Query Retrieval

Text Splitters

Recursive Character Text Splitter

Character Text Splitter

Token Text Splitter

Decorators

@Embeddable

@SemanticSearch

@HybridSearch

Advanced Retrieval Methods

Multi-Query Retrieval

Conversational RAG

Hybrid Search

Context Compression

Reranking

Ensemble Retrieval

Error Handling

Available Error Codes

Debugging

Best Practices

Choose the Right Vector Store

Optimize Chunk Size

Use Metadata Filtering

Implement Caching

Monitor Performance

Troubleshooting

Connection Errors

Dimension Mismatch

Docker Setup for Self-Hosted Stores

Low Search Quality

Custom loaders with `@Loader` and `DocumentLoaderRegistry`