DocumentationReference

HazelJS RAG Package

npm downloads

@hazeljs/rag provides Retrieval-Augmented Generation for HazelJS applications with document loaders, knowledge graph retrieval (GraphRAG), memory management, vector stores, and semantic search.

Quick Reference

  • Purpose: @hazeljs/rag provides document loading, chunking, embedding, vector storage, semantic search, hybrid search, GraphRAG, and memory management for building RAG applications in HazelJS.
  • When to use: Use @hazeljs/rag when a HazelJS application needs to retrieve relevant documents before LLM generation, build a knowledge base, or maintain conversation memory with semantic retrieval. Use @hazeljs/ai alone for simple LLM calls without document retrieval.
  • Key concepts: RAGPipeline, Reranker, document loaders (11 built-in), chunking strategies, OpenAIEmbeddings/CohereEmbeddings, vector stores (Memory, Pinecone, Qdrant, Weaviate, ChromaDB), @SemanticSearch decorator, @HybridSearch decorator, GraphRAG (knowledge graph), MemoryManager, RagModule.
  • Inputs: Documents (text, PDF, web, etc.), user queries, embedding configuration, vector store configuration.
  • Outputs: Retrieved document chunks ranked by relevance, context-grounded LLM responses, knowledge graph entities and relationships.
  • Dependencies: @hazeljs/core, @hazeljs/ai (for embeddings and LLM generation), a vector store provider.
  • Common patterns: Load documents → chunk → embed → index in vector store → query with @SemanticSearch or RAGPipeline → pass retrieved context to LLM → generate response.
  • Common mistakes: Not chunking documents before indexing (large documents produce poor embeddings); using in-memory vector store in production (not persistent); not adding metadata to documents; setting topK too high (noise) or too low (missing context).

Purpose

Building RAG applications requires integrating vector databases, managing embeddings, loading documents from diverse sources, implementing search strategies, and maintaining conversation context. The @hazeljs/rag package solves all of this in one place:

  • 11 Document Loaders: TXT, Markdown, JSON, CSV, HTML, PDF, DOCX, web scraping, YouTube transcripts, GitHub repos, and inline text — all with a unified BaseDocumentLoader API
  • GraphRAG: Knowledge graph-based retrieval that extracts entities and relationships, detects communities, and enables entity-centric (local) and thematic (global) search that outperforms flat cosine similarity
  • 5 Vector Store Implementations: Memory, Pinecone, Qdrant, Weaviate, and ChromaDB with a unified interface
  • Memory System: Conversation tracking, entity memory, fact storage, and working memory for context-aware AI
  • Multiple Embedding Providers: OpenAI and Cohere embeddings with easy extensibility
  • Advanced Retrieval Strategies: Hybrid search (vector + BM25), multi-query retrieval, and semantic search
  • Intelligent Text Splitting: Multiple chunking strategies for optimal retrieval
  • RAG + Memory Integration: Combine document retrieval with conversation history for enhanced context
  • Decorator-Based API: @Embeddable, @SemanticSearch, @HybridSearch for declarative RAG
  • Production-Ready: Battle-tested patterns with proper error handling and TypeScript support

Architecture

graph TD
  A["Documents"] --> B["Text Splitter"]
  B --> C["Chunks"]
  C --> D["Embedding Provider<br/>(OpenAI, Cohere)"]
  D --> E["Vector Embeddings"]
  E --> F["Vector Store<br/>(Memory, Pinecone, Qdrant, etc.)"]
  G["User Query"] --> H["Embedding Provider"]
  H --> I["Query Vector"]
  I --> J["Retrieval Strategy<br/>(Semantic, Hybrid, Multi-Query)"]
  J --> F
  F --> K["Initial Results"]
  K --> L["Reranker<br/>(Cohere Rerank 3)"]
  L --> M["Ranked Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style C fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style D fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style E fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style F fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff

Key Components

  • RAG Pipeline: Orchestrates document indexing, query processing, and result retrieval
  • Vector Stores: Pluggable storage backends for embeddings and documents
  • Embedding Providers: Generate vector embeddings from text
  • Retrieval Strategies: Advanced search algorithms (hybrid, multi-query, BM25)
  • Rerankers: Pluggable re-ordering of search results for high-precision retrieval (Cohere)
  • Text Splitters: Intelligent document chunking for optimal retrieval
  • Decorators: @Embeddable, @SemanticSearch, @HybridSearch for declarative RAG

Advantages

Vector Store Flexibility

Start with in-memory storage for development, then seamlessly switch to Pinecone, Qdrant, Weaviate, or ChromaDB for production—all with the same API.

Advanced Retrieval

Built-in support for hybrid search (combining vector and keyword search), multi-query retrieval (generating multiple search queries), and BM25 keyword ranking.

Semantic Reranking

High-precision retrieval with built-in support for Cohere Rerank 3. Vector search finds relevant documents, but Rerankers identify the exact needles in the haystack to virtually eliminate hallucinations.

Developer Experience

Decorator-based API means you can add RAG capabilities with a single decorator. No need to manage vector stores, embeddings, or search logic manually.

Production Ready

Proper error handling, TypeScript support, connection pooling, and battle-tested patterns make it ready for production use.

Extensible

Easy to add custom vector stores, embedding providers, or retrieval strategies by implementing simple interfaces.

Installation

# Core RAG package
npm install @hazeljs/rag

# Peer dependencies (choose based on your needs)
npm install openai  # For OpenAI embeddings and GraphRAG LLM

# Optional: Vector store clients (install only what you need)
npm install @pinecone-database/pinecone  # For Pinecone
npm install @qdrant/js-client-rest       # For Qdrant
npm install weaviate-ts-client           # For Weaviate
npm install chromadb                     # For ChromaDB

Optional Document Loader Dependencies:

# For Cohere embeddings
npm install cohere-ai

# For PDF loading (PdfLoader)
npm install pdf-parse

# For Word document loading (DocxLoader)
npm install mammoth

# For CSS-selector web scraping (WebLoader / HtmlFileLoader)
npm install cheerio

Quick Start

Fastest Way: RAGPipeline.from()

The easiest way to get started with RAG using the new factory method:

import { RAGPipeline } from '@hazeljs/rag';

// One-liner setup with sensible defaults
const pipeline = RAGPipeline.from({
  provider: 'openai',  // or 'cohere'
  apiKey: process.env.OPENAI_API_KEY,  // Falls back to env var
  topK: 5,
  llm: async (prompt) => {
    // Your LLM function for answer generation
    const response = await fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'gpt-4',
        messages: [{ role: 'user', content: prompt }],
      }),
    });
    const data = await response.json();
    return data.choices[0].message.content;
  },
  reranker: 'cohere', // NEW: Cohere Rerank 3 for high-precision retrieval
});

// Initialize and use
await pipeline.initialize();

// Universal document ingestion - auto-detects file type
await pipeline.addDocuments([
  { content: 'HazelJS is a TypeScript framework...', metadata: { source: 'docs' } },
]);

// Query the knowledge base
const result = await pipeline.query('What is HazelJS?');
console.log(result.answer);

Universal Document Ingestion

The RAGService provides a universal ingest() method that auto-detects file types:

import { RAGService } from '@hazeljs/rag';

const rag = new RAGService({
  vectorStore,
  embeddingProvider,
  llmFunction,
});

// Auto-detects and loads any supported format
await rag.ingest('./docs/guide.pdf');           // PDF
await rag.ingest('./data/faq.csv');             // CSV
await rag.ingest('https://example.com/page');   // Web page
await rag.ingest('./knowledge-base/');          // Entire directory

// Then query
const { answer, sources } = await rag.ask('What is the pricing?');

Manual Setup (Advanced)

For more control, set up the pipeline manually:

import { 
  RAGPipeline, 
  OpenAIEmbeddings, 
  MemoryVectorStore 
} from '@hazeljs/rag';

// Setup embeddings provider
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536,
});

// Create vector store
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

// Create RAG pipeline
const rag = new RAGPipeline({
  vectorStore,
  embeddingProvider: embeddings,
  topK: 5, // Return top 5 results
});

await rag.initialize();

// Index documents
await rag.addDocuments([
  {
    content: 'HazelJS is a modern TypeScript framework for building scalable applications.',
    metadata: { category: 'framework', source: 'docs' },
  },
  {
    content: 'The RAG package provides semantic search and vector database integration.',
    metadata: { category: 'rag', source: 'docs' },
  },
]);

// Query with semantic search
const results = await rag.search('What is HazelJS?', { topK: 3 });

console.log('Search Results:');
results.forEach((result, index) => {
  console.log(`${index + 1}. ${result.content}`);
  console.log(`   Score: ${result.score}`);
  console.log(`   Metadata:`, result.metadata);
});

Document Loaders

Document loaders are the entry point of every RAG pipeline. They read data from any source and return a standardised Document[] array ready for chunking and indexing. Every real-world application needs them immediately — @hazeljs/rag ships 11 built-in loaders covering every common source.

Loader overview

LoaderSourceExtra install?
TextFileLoader.txt files
MarkdownFileLoader.md / .mdx with heading splits and YAML front-matter
JSONFileLoader.json arrays or objects with textKey / jsonPointer extraction
CSVFileLoader.csv rows mapped to documents with configurable columns
HtmlFileLoader.html tag stripping; CSS selectors via cheeriooptional cheerio
DirectoryLoaderRecursive directory walk, auto-detects loader by extension
PdfLoaderPDFs via pdf-parse; split by page or as one documentnpm i pdf-parse
DocxLoaderWord documents via mammoth; plain text or HTML outputnpm i mammoth
WebLoaderHTTP page scraping; CSS selectors via cheerio; retry/timeoutoptional cheerio
YouTubeTranscriptLoaderYouTube transcript download (no API key); segment by duration
GitHubLoaderGitHub REST API; filter by directory, extension, maxFiles

File loaders

import {
  TextFileLoader,
  MarkdownFileLoader,
  JSONFileLoader,
  CSVFileLoader,
  HtmlFileLoader,
} from '@hazeljs/rag';

// Plain text — one document per file
const textDocs = await new TextFileLoader({
  filePath: './docs/notes.txt',
}).load();

// Markdown — split into one document per heading section
const mdDocs = await new MarkdownFileLoader({
  filePath: './docs/guide.md',
  splitByHeading: true,        // creates one Document per H2/H3 section
  parseYamlFrontMatter: true,  // front-matter fields become metadata
}).load();
// mdDocs[0].metadata.heading === 'Installation'

// JSON — extract a specific field as the document content
const jsonDocs = await new JSONFileLoader({
  filePath: './data/articles.json',
  textKey: 'body',             // use 'body' field as content
  // jsonPointer: '/items',    // navigate nested JSON with a JSON Pointer
}).load();

// CSV — map rows to documents; choose which columns become content vs metadata
const csvDocs = await new CSVFileLoader({
  filePath: './data/faqs.csv',
  contentColumns: ['question', 'answer'],
  metadataColumns: ['category'],
}).load();

// HTML — strips all tags, extracts title
const htmlDocs = await new HtmlFileLoader({
  filePath: './docs/index.html',
  selector: 'main',            // optional: only extract content inside <main>
}).load();

DirectoryLoader — bulk ingest

DirectoryLoader walks a directory recursively and automatically delegates each file to the right typed loader. This is the fastest way to ingest a knowledge base from disk:

import { DirectoryLoader } from '@hazeljs/rag';

const docs = await new DirectoryLoader({
  dirPath: './knowledge-base',
  recursive: true,
  // extensions: ['.md', '.txt'],   // filter to specific types
  // exclude: ['**/node_modules/**'],
}).load();

console.log(`Loaded ${docs.length} documents from ${[...new Set(docs.map(d => d.metadata?.source))].length} files`);

PDF and Word documents

import { PdfLoader, DocxLoader } from '@hazeljs/rag';

// PDF — one document per page or the whole file
const pdfDocs = await new PdfLoader({
  filePath: './reports/annual-report.pdf',
  splitByPage: true,   // each page becomes its own Document
}).load();

// Word document
const wordDocs = await new DocxLoader({
  filePath: './contracts/agreement.docx',
  outputFormat: 'text',  // 'text' (default) or 'html'
}).load();

WebLoader — scrape any URL

import { WebLoader } from '@hazeljs/rag';

// Single URL
const docs = await new WebLoader({
  urls: ['https://hazeljs.ai/docs'],
  timeout: 10_000,
  maxRetries: 3,
  // selector: 'article',   // optional: CSS selector (requires cheerio)
}).load();

// Multiple URLs in one call
const batchDocs = await new WebLoader({
  urls: [
    'https://hazeljs.ai/docs/installation',
    'https://hazeljs.ai/blog/graphrag',
  ],
}).load();

YouTubeTranscriptLoader — no API key needed

import { YouTubeTranscriptLoader } from '@hazeljs/rag';

// Works with full URL or just the video ID
const transcriptDocs = await new YouTubeTranscriptLoader({
  videoUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
  segmentDuration: 60,   // group transcript into 60-second chunks
}).load();

// Each doc has metadata: { videoId, startTime, endTime, source }

GitHubLoader — index entire repositories

import { GitHubLoader } from '@hazeljs/rag';

const repoDocs = await new GitHubLoader({
  owner: 'hazeljs',
  repo: 'hazel',
  ref: 'main',                  // branch or tag
  directory: 'docs',            // only load this sub-directory
  extensions: ['.md', '.mdx'],  // only Markdown files
  maxFiles: 100,
  token: process.env.GITHUB_TOKEN, // optional; avoids 60 req/hr rate limit
}).load();

Custom loaders with @Loader and DocumentLoaderRegistry

Extend BaseDocumentLoader to add any data source. The @Loader decorator registers metadata for auto-detection:

import {
  BaseDocumentLoader,
  Loader,
  DocumentLoaderRegistry,
} from '@hazeljs/rag';

@Loader({
  name: 'NotionLoader',
  description: 'Loads pages from a Notion database',
  extensions: [],
  mimeTypes: ['application/vnd.notion'],
})
export class NotionLoader extends BaseDocumentLoader {
  constructor(private readonly databaseId: string) {
    super();
  }

  async load() {
    const pages = await fetchNotionDatabase(this.databaseId);
    return pages.map((page) =>
      this.createDocument(page.content, {
        source: `notion:${this.databaseId}/${page.id}`,
        title: page.title,
        lastEdited: page.lastEditedTime,
      }),
    );
  }
}

// Register once at startup — then DirectoryLoader and the registry can use it
DocumentLoaderRegistry.register(
  NotionLoader,
  (databaseId: string) => new NotionLoader(databaseId),
);

Full ingest pipeline

Putting it all together with the RAG pipeline:

import {
  DirectoryLoader,
  GitHubLoader,
  WebLoader,
  RAGPipeline,
  OpenAIEmbeddings,
  MemoryVectorStore,
  RecursiveTextSplitter,
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({ apiKey: process.env.OPENAI_API_KEY });
const vectorStore = new MemoryVectorStore(embeddings);
const splitter = new RecursiveTextSplitter({ chunkSize: 800, chunkOverlap: 150 });

const pipeline = new RAGPipeline({ vectorStore, embeddingProvider: embeddings, textSplitter: splitter });
await pipeline.initialize();

// Load from multiple sources
const [localDocs, githubDocs, webDocs] = await Promise.all([
  new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load(),
  new GitHubLoader({ owner: 'hazeljs', repo: 'hazel', directory: 'docs', extensions: ['.md'] }).load(),
  new WebLoader({ urls: ['https://hazeljs.ai/docs'] }).load(),
]);

// Index everything at once
const ids = await pipeline.addDocuments([...localDocs, ...githubDocs, ...webDocs]);
console.log(`Indexed ${ids.length} chunks`);

GraphRAG

GraphRAG extends traditional vector search by building a knowledge graph of entities and relationships extracted from your documents. Instead of searching raw text chunks by cosine similarity, it retrieves structured facts and cross-document themes — answering questions that flat vector search cannot.

See the full GraphRAG Guide for an in-depth walkthrough.

Why GraphRAG?

Traditional RAG retrieves the K most similar text chunks. This works well for narrow questions but fails for:

  • Cross-document reasoning — "How do all the components in the system relate to each other?"
  • Thematic questions — "What are the main architectural layers of this codebase?"
  • Entity-relationship queries — "What does the AgentGraph depend on?"

GraphRAG solves this with two complementary retrieval modes:

ModeHow it worksBest for
LocalFinds entities matching the query, traverses K hops in the knowledge graph, assembles entity + relationship contextSpecific "what is / how does" questions
GlobalRanks LLM-generated community reports by relevance; assembles thematic summariesBroad "what are the main themes / architecture" questions
HybridRuns both in parallel, merges contexts, single LLM synthesis callBest default — covers both dimensions

Architecture

graph TD
  A["Documents"] --> B["Text Chunks"]
  B --> C["Entity Extractor<br/>(LLM)"]
  C --> D["Knowledge Graph<br/>(GraphStore)"]
  D --> E["Community Detector<br/>(Label Propagation)"]
  E --> F["Community Summarizer<br/>(LLM Reports)"]

  G["User Query"] --> H{"Search Mode"}
  H -->|"local"| I["Seed Entity Lookup"]
  H -->|"global"| J["Community Report Ranking"]
  H -->|"hybrid"| K["Both in Parallel"]

  I --> L["BFS Graph Traversal<br/>(K hops)"]
  L --> M["Entity + Relationship Context"]
  J --> N["Top-K Report Summaries"]
  K --> O["Merged Context"]

  M --> P["LLM Synthesis"]
  N --> P
  O --> P
  P --> Q["Answer + Sources"]

  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style F fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style Q fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

Building the knowledge graph

import OpenAI from 'openai';
import {
  GraphRAGPipeline,
  DirectoryLoader,
} from '@hazeljs/rag';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Create the pipeline — provide an LLM function for extraction and synthesis
const graphRag = new GraphRAGPipeline({
  llm: async (prompt) => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      temperature: 0,
      messages: [{ role: 'user', content: prompt }],
    });
    return res.choices[0].message.content ?? '';
  },
  extractionChunkSize: 2000,      // max chars per LLM extraction call
  generateCommunityReports: true, // produce LLM summaries per community cluster
  maxCommunitySize: 15,           // split communities larger than this
  localSearchDepth: 2,            // BFS hops for local search
  localSearchTopK: 5,             // seed entities per query
  globalSearchTopK: 5,            // community reports used in global search
});

// Load documents from any source
const docs = await new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load();

// Build extracts entities, builds the graph, detects communities, and writes reports
const stats = await graphRag.build(docs);
console.log(stats);
// {
//   documentsProcessed: 12,
//   entitiesExtracted: 47,
//   relationshipsExtracted: 63,
//   communitiesDetected: 8,
//   communityReportsGenerated: 8,
//   duration: 18400,
// }

Local search — entity-centric

Best for specific, factual questions about named concepts, technologies, or processes:

const result = await graphRag.search('How does HazelJS dependency injection work?', {
  mode: 'local',
  depth: 2,   // traverse up to 2 hops from seed entities
  topK: 5,    // start from 5 seed entities
});

console.log(result.answer);
// "HazelJS uses constructor injection. When the IoC container resolves
//  a @Service(), it reads TypeScript metadata to identify constructor
//  parameters and injects resolved instances automatically..."

console.log(result.entities.map(e => `${e.name} [${e.type}]`));
// ['Dependency Injection [CONCEPT]', 'IoC Container [TECHNOLOGY]',
//  '@Service [FEATURE]', 'HazelJS [TECHNOLOGY]', ...]

console.log(result.relationships.map(r => `${r.type}: ${r.description}`));
// ['USES: HazelJS uses constructor injection pattern', ...]

Global search — community reports

Best for broad questions about themes, architecture, or the overall scope of a knowledge base:

const result = await graphRag.search(
  'What are the main architectural layers of the HazelJS framework?',
  {
    mode: 'global',
    topK: 5,  // include top 5 community reports by relevance
  },
);

console.log(result.communities[0]);
// {
//   communityId: 'community_0',
//   title: 'HazelJS Core Infrastructure Layer',
//   summary: 'This community represents the foundational layer of HazelJS...',
//   findings: ['HazelJS Core provides HTTP and DI foundation', ...],
//   rating: 9,
// }

Hybrid search — best default

Runs local and global in parallel and merges their contexts before a single LLM synthesis call:

const result = await graphRag.search(
  'What vector stores does @hazeljs/rag support and how do I swap them?',
  {
    mode: 'hybrid',      // default when mode is omitted
    includeGraph: true,  // include entities + relationships in result
    includeCommunities: true,
  },
);

console.log(`${result.mode} search in ${result.duration}ms`);
console.log(`Entities found: ${result.entities.length}`);
console.log(`Communities used: ${result.communities.length}`);

Entity and relationship types

The LLM extractor maps every concept to one of these canonical types, making the graph consistent and queryable:

Entity types: CONCEPT · TECHNOLOGY · PERSON · ORGANIZATION · PROCESS · FEATURE · EVENT · LOCATION · OTHER

Relationship types: USES · IMPLEMENTS · CREATED_BY · PART_OF · DEPENDS_ON · RELATED_TO · EXTENDS · CONFIGURES · TRIGGERS · PRODUCES · REPLACES · OTHER

Incremental updates

Add new documents to an existing graph without rebuilding from scratch:

// Add a new batch of documents to the existing graph
const updateStats = await graphRag.addDocuments(newDocs);
// Graph re-runs community detection and regenerates reports after each batch

Inspect the graph

The full knowledge graph is available for visualisation (D3.js, Cytoscape.js, etc.):

const graph = graphRag.getGraph();

// Entities
console.log([...graph.entities.values()].slice(0, 3));
// [{ id, name, type, description, sourceDocIds }, ...]

// Relationships
console.log([...graph.relationships.values()].slice(0, 3));
// [{ id, sourceId, targetId, type, description, weight }, ...]

// Community reports
console.log([...graph.communityReports.values()].map(r => r.title));
// ['HazelJS Core DI System', 'RAG Pipeline & Vector Stores', ...]

// Statistics
const stats = graphRag.getStats();
console.log(stats.entityTypeBreakdown);
// { TECHNOLOGY: 14, CONCEPT: 12, FEATURE: 9, PROCESS: 7, ... }
console.log(stats.topEntities.slice(0, 3));
// [{ name: 'HazelJS', connections: 12 }, ...]

GraphRAG vs traditional RAG

Traditional RAGGraphRAG
StorageFlat vector indexKnowledge graph + vector index
Retrieval unitText chunkEntity + relationships + community
Cross-document reasoningLimitedNative
Broad thematic questionsPoorExcellent (community reports)
Specific entity questionsGoodExcellent (BFS traversal)
Setup costLowMedium (LLM extraction pass)
Token cost per queryLowMedium
Best use caseQ&A over focused docsMulti-document knowledge bases

Vector Stores

The RAG package supports 5 vector store implementations with a unified interface.

Memory Vector Store (Development)

In-memory storage with no external dependencies. Perfect for development and testing.

Advantages:

  • Zero setup required
  • Extremely fast
  • No external dependencies
  • Great for testing and CI/CD

Limitations:

  • Data lost on restart
  • Limited to available memory
  • Not suitable for production
import { MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

// Use it
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });

Pinecone Vector Store (Production, Serverless)

Fully managed, serverless vector database with automatic scaling.

Advantages:

  • Fully managed (no infrastructure)
  • Auto-scaling
  • Global distribution
  • High performance
  • Excellent for serverless deployments

Limitations:

  • Paid service (free tier available)
  • Network latency for self-hosted alternatives
import { PineconeVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new PineconeVectorStore(embeddings, {
  apiKey: process.env.PINECONE_API_KEY,
  environment: process.env.PINECONE_ENVIRONMENT,
  indexName: 'my-knowledge-base',
});

await vectorStore.initialize();

// Same API as Memory store
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });

Setup:

  • Sign up at pinecone.io
  • Create an index with dimension matching your embeddings (1536 for OpenAI text-embedding-3-small)
  • Get your API key and environment from the dashboard

Qdrant Vector Store (High-Performance, Self-Hosted)

Rust-based vector database optimized for speed and efficiency.

Advantages:

  • Extremely fast (Rust-based)
  • Advanced filtering capabilities
  • Self-hosted (full control)
  • Open-source
  • Cost-effective for large datasets

Limitations:

  • Requires infrastructure management
  • Setup complexity
import { QdrantVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new QdrantVectorStore(embeddings, {
  url: process.env.QDRANT_URL || 'http://localhost:6333',
  collectionName: 'my-knowledge-base',
});

await vectorStore.initialize();

Setup with Docker:

docker run -p 6333:6333 qdrant/qdrant

Weaviate Vector Store (GraphQL, Flexible)

Open-source vector database with GraphQL API and advanced features.

Advantages:

  • GraphQL API
  • Flexible schema
  • Built-in vectorization
  • Hybrid search support
  • Multi-tenancy

Limitations:

  • Requires infrastructure
  • Learning curve for GraphQL
import { WeaviateVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new WeaviateVectorStore(embeddings, {
  host: process.env.WEAVIATE_HOST || 'http://localhost:8080',
  className: 'MyKnowledgeBase',
});

await vectorStore.initialize();

Setup with Docker:

docker run -p 8080:8080 semitechnologies/weaviate:latest

ChromaDB Vector Store (Prototyping, Embedded)

Lightweight, embeddable vector database perfect for prototyping.

Advantages:

  • Easy setup
  • Lightweight
  • Can run embedded or as a server
  • Great for prototyping
  • Python and JavaScript support

Limitations:

  • Less mature than alternatives
  • Limited scalability for very large datasets
import { ChromaVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new ChromaVectorStore(embeddings, {
  url: process.env.CHROMA_URL || 'http://localhost:8000',
  collectionName: 'my-knowledge-base',
});

await vectorStore.initialize();

// ChromaDB-specific features
const stats = await vectorStore.getStats();
console.log('Collection size:', stats.count);

const preview = await vectorStore.peek(5);
console.log('First 5 documents:', preview);

Setup with Docker:

docker run -p 8000:8000 chromadb/chroma

Vector Store Comparison

FeatureMemoryPineconeQdrantWeaviateChromaDB
SetupNoneAPI KeyDockerDockerDocker
Persistence
ScalabilityLowHighHighHighMedium
Performance⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
CostFreePaidFree (OSS)Free (OSS)Free (OSS)
Best ForDev/TestProductionHigh-perfGraphQLPrototyping
Metadata Filtering
Hybrid Search
Multi-tenancy

Embedding Providers

Embedding providers convert text into vector representations for semantic search.

OpenAI Embeddings

State-of-the-art embeddings from OpenAI with multiple model options.

Models:

  • text-embedding-3-small: 1536 dimensions, fast and cost-effective
  • text-embedding-3-large: 3072 dimensions, highest quality
  • text-embedding-ada-002: Legacy model, 1536 dimensions
import { OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536, // Optional: reduce dimensions for faster search
});

// Embed single text
const vector = await embeddings.embed('Hello world');
console.log('Vector dimensions:', vector.length);

// Embed multiple texts (batch)
const vectors = await embeddings.embedBatch([
  'First document',
  'Second document',
  'Third document',
]);

Cohere Embeddings

Multilingual embeddings from Cohere with excellent performance.

Models:

  • embed-english-v3.0: English-only, high quality
  • embed-multilingual-v3.0: 100+ languages
  • embed-english-light-v3.0: Faster, smaller model
import { CohereEmbeddings } from '@hazeljs/rag';

const embeddings = new CohereEmbeddings({
  apiKey: process.env.COHERE_API_KEY,
  model: 'embed-english-v3.0',
  inputType: 'search_document', // or 'search_query'
});

const vector = await embeddings.embed('Hello world');

Retrieval Strategies

Advanced search strategies for better results.

Combines vector similarity search with BM25 keyword search for best results.

graph LR
  A["Query"] --> B["Vector Search<br/>(Semantic)"]
  A --> C["BM25 Search<br/>(Keyword)"]
  B --> D["Score Fusion"]
  C --> D
  D --> E["Ranked Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style E fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

How it works:

  • Performs vector similarity search (semantic understanding)
  • Performs BM25 keyword search (exact term matching)
  • Normalizes scores from both methods
  • Combines scores with configurable weights
  • Returns re-ranked results
import { 
  HybridSearchRetrieval, 
  MemoryVectorStore, 
  OpenAIEmbeddings 
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

const hybridSearch = new HybridSearchRetrieval(vectorStore, {
  vectorWeight: 0.7,  // 70% weight to semantic search
  keywordWeight: 0.3, // 30% weight to keyword search
  topK: 10,
});

// Add documents
await vectorStore.addDocuments(documents);

// Search with hybrid strategy
const results = await hybridSearch.search('machine learning algorithms', {
  topK: 5,
});

Multi-Query Retrieval

Generates multiple query variations using an LLM to improve recall.

graph TD
  A["Original Query"] --> B["LLM Query Generator"]
  B --> C["Query Variation 1"]
  B --> D["Query Variation 2"]
  B --> E["Query Variation 3"]
  C --> F["Vector Search"]
  D --> F
  E --> F
  F --> G["Deduplicate & Rank"]
  G --> H["Final Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style D fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style E fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style G fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
  style H fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff

How it works:

  • Takes user's original question
  • Uses LLM to generate multiple variations
  • Searches with each variation
  • Deduplicates results
  • Re-ranks by frequency and average score
import { 
  MultiQueryRetrieval, 
  MemoryVectorStore, 
  OpenAIEmbeddings 
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

const multiQuery = new MultiQueryRetrieval(vectorStore, {
  llmApiKey: process.env.OPENAI_API_KEY,
  numQueries: 3, // Generate 3 query variations
  topK: 10,
});

// Add documents
await vectorStore.addDocuments(documents);

// Search with multiple query variations
const results = await multiQuery.search('How do I deploy my app?', {
  topK: 5,
});

Text Splitters

Intelligent document chunking for optimal retrieval.

Recursive Character Text Splitter

Splits text recursively by trying different separators (paragraphs, sentences, words).

import { RecursiveCharacterTextSplitter } from '@hazeljs/rag';

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,      // Target chunk size in characters
  chunkOverlap: 200,    // Overlap between chunks for context
  separators: ['\n\n', '\n', '. ', ' '], // Try these in order
});

const chunks = await splitter.splitText(longDocument);

console.log(`Split into ${chunks.length} chunks`);
chunks.forEach((chunk, i) => {
  console.log(`Chunk ${i + 1}: ${chunk.substring(0, 50)}...`);
});

Character Text Splitter

Simple character-based splitting with overlap.

import { CharacterTextSplitter } from '@hazeljs/rag';

const splitter = new CharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 50,
  separator: '\n',
});

const chunks = await splitter.splitText(document);

Token Text Splitter

Splits by token count (useful for LLM context windows).

import { TokenTextSplitter } from '@hazeljs/rag';

const splitter = new TokenTextSplitter({
  chunkSize: 512,      // Max tokens per chunk
  chunkOverlap: 50,    // Overlap in tokens
  encodingName: 'cl100k_base', // OpenAI encoding
});

const chunks = await splitter.splitText(document);

Decorators

Declarative RAG with decorators.

@Embeddable

Mark a class as embeddable for automatic vector storage.

import { Embeddable, Embedded } from '@hazeljs/rag';

@Embeddable({
  vectorStore: 'memory',
  embeddingProvider: 'openai',
})
class Article {
  @Embedded()
  title: string;

  @Embedded()
  content: string;

  metadata: {
    author: string;
    date: Date;
  };
}

@SemanticSearch

Add semantic search to a method.

import { Controller, Get } from '@hazeljs/common';
import { SemanticSearch } from '@hazeljs/rag';

@Controller('search')
class SearchController {
  @Get()
  @SemanticSearch({
    vectorStore: 'pinecone',
    topK: 5,
  })
  async search(@Query('q') query: string) {
    // Results automatically injected
    return { query, results: this.searchResults };
  }
}

@HybridSearch

Add hybrid search (vector + keyword) to a method.

import { Controller, Get } from '@hazeljs/common';
import { HybridSearch } from '@hazeljs/rag';

@Controller('search')
class SearchController {
  @Get('hybrid')
  @HybridSearch({
    vectorStore: 'qdrant',
    vectorWeight: 0.7,
    keywordWeight: 0.3,
    topK: 10,
  })
  async hybridSearch(@Query('q') query: string) {
    return { query, results: this.searchResults };
  }
}

Advanced Retrieval Methods

The RAGService provides several advanced retrieval methods beyond basic semantic search:

Multi-Query Retrieval

Generate multiple search queries from a single question to improve recall:

// Generates 3 variations of the query and combines results
const results = await rag.multiQuery('What is HazelJS?', 3);

Conversational RAG

Maintain conversation context across multiple turns:

const sessionId = 'user-123';

// First question
const result1 = await rag.chat('What is HazelJS?', sessionId);

// Follow-up question - automatically uses conversation history
const result2 = await rag.chat('How do I install it?', sessionId);

// Clear conversation when done
rag.clearChat(sessionId);

Hybrid Search

Combine vector similarity with BM25 keyword search:

const results = await rag.hybridSearch('TypeScript framework', {
  topK: 10,
  vectorWeight: 0.7,  // 70% vector similarity
  keywordWeight: 0.3,  // 30% keyword matching
});

Context Compression

Remove redundant and low-relevance results:

const results = await rag.search('query', { topK: 20 });
const compressed = await rag.compress(results, 'query');
// Returns ~5 most relevant, non-redundant results

Reranking

Re-score results using LLM-based cross-encoder scoring:

const results = await rag.search('query', { topK: 20 });
const reranked = await rag.rerank(results, 'query', 5);
// Returns top 5 after LLM reranking

Ensemble Retrieval

Combine multiple retrieval strategies with weighted fusion:

import { RetrievalStrategy } from '@hazeljs/rag';

const results = await rag.ensemble(
  'query',
  [RetrievalStrategy.SIMILARITY, RetrievalStrategy.HYBRID, RetrievalStrategy.MMR],
  [0.5, 0.3, 0.2]  // Weights for each strategy
);

Error Handling

HazelJS provides typed error classes for robust RAG error handling:

import { RAGService, RAGError, RAGErrorCode } from '@hazeljs/rag';

try {
  await rag.ingest('./documents/guide.pdf');
} catch (error) {
  if (error instanceof RAGError) {
    switch (error.code) {
      case RAGErrorCode.MISSING_DEPENDENCY:
        console.error('Install required package:', error.message);
        // Error includes package name and install command
        break;
      case RAGErrorCode.LOADER_ERROR:
        console.error('Failed to load document:', error.message);
        break;
      case RAGErrorCode.EMBEDDING_ERROR:
        console.error('Embedding generation failed:', error.message);
        // Retry or use fallback
        break;
      case RAGErrorCode.VECTOR_STORE_ERROR:
        console.error('Vector store error:', error.message);
        break;
      default:
        console.error('RAG error:', error.message);
    }
  }
  throw error;
}

Available Error Codes

  • VECTOR_STORE_ERROR - Vector database operation failed
  • EMBEDDING_ERROR - Embedding generation failed
  • LOADER_ERROR - Document loading failed
  • SPLITTER_ERROR - Text splitting failed
  • LLM_GENERATION_ERROR - Answer generation failed
  • INDEX_ERROR - Document indexing failed
  • RETRIEVAL_ERROR - Search/retrieval failed
  • UNSUPPORTED_FORMAT - File format not supported
  • MISSING_DEPENDENCY - Required package not installed
  • CONFIGURATION_ERROR - Invalid configuration

Debugging

Enable debug logging with the HAZELJS_DEBUG environment variable:

HAZELJS_DEBUG=true npm start

Debug output for RAG operations:

2024-03-23T12:00:00.000Z [hazeljs:rag] ingest start source=./docs/guide.pdf
2024-03-23T12:00:01.200Z [hazeljs:rag] ingest loaded docs=15
2024-03-23T12:00:02.500Z [hazeljs:rag] ingest complete ids=15
2024-03-23T12:00:03.000Z [hazeljs:rag] search query=What is HazelJS? strategy=similarity
2024-03-23T12:00:03.300Z [hazeljs:rag] search results=5
2024-03-23T12:00:04.100Z [hazeljs:rag] ask query=What is HazelJS?
2024-03-23T12:00:05.500Z [hazeljs:rag] ask complete answer_len=245 sources=5

Best Practices

Choose the Right Vector Store

  • Development: Use MemoryVectorStore for fast iteration
  • Production (Serverless): Use PineconeVectorStore for zero infrastructure
  • Production (Self-Hosted): Use QdrantVectorStore for performance and cost
  • Prototyping: Use ChromaVectorStore for quick setup

Optimize Chunk Size

// For Q&A: Smaller chunks (200-500 chars)
const qaChunks = new RecursiveCharacterTextSplitter({
  chunkSize: 300,
  chunkOverlap: 50,
});

// For summarization: Larger chunks (1000-2000 chars)
const summaryChunks = new RecursiveCharacterTextSplitter({
  chunkSize: 1500,
  chunkOverlap: 200,
});

Use Metadata Filtering

// Add metadata when indexing
await vectorStore.addDocuments([
  {
    content: 'Document content',
    metadata: {
      category: 'technical',
      date: '2024-01-01',
      author: 'John Doe',
    },
  },
]);

// Filter during search
const results = await vectorStore.search('query', {
  topK: 5,
  filter: {
    category: 'technical',
    date: { $gte: '2024-01-01' },
  },
});

Implement Caching

import { CacheService } from '@hazeljs/cache';

class RAGService {
  constructor(
    private vectorStore: VectorStore,
    private cache: CacheService,
  ) {}

  async search(query: string) {
    const cacheKey = `search:${query}`;
    
    // Check cache first
    const cached = await this.cache.get(cacheKey);
    if (cached) return cached;

    // Perform search
    const results = await this.vectorStore.search(query);

    // Cache results
    await this.cache.set(cacheKey, results, 3600); // 1 hour

    return results;
  }
}

Monitor Performance

async function searchWithMetrics(query: string) {
  const start = Date.now();
  
  try {
    const results = await vectorStore.search(query);
    const duration = Date.now() - start;
    
    console.log(`Search completed in ${duration}ms`);
    console.log(`Found ${results.length} results`);
    
    return results;
  } catch (error) {
    console.error('Search failed:', error);
    throw error;
  }
}

Troubleshooting

Connection Errors

// Add retry logic
async function connectWithRetry(vectorStore: VectorStore, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await vectorStore.initialize();
      console.log('Connected successfully');
      return;
    } catch (error) {
      console.log(`Connection attempt ${i + 1} failed`);
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}

Dimension Mismatch

// Ensure embedding dimensions match vector store configuration
// OpenAI text-embedding-3-small = 1536 dimensions
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536, // Must match index
});

Docker Setup for Self-Hosted Stores

# Qdrant
docker run -p 6333:6333 qdrant/qdrant

# Weaviate
docker run -p 8080:8080 semitechnologies/weaviate:latest

# ChromaDB
docker run -p 8000:8000 chromadb/chroma

Low Search Quality

  • Increase chunk overlap: More context between chunks
  • Adjust chunk size: Smaller chunks for precise retrieval
  • Use hybrid search: Combine semantic and keyword search
  • Add metadata filtering: Narrow down search scope
  • Try multi-query retrieval: Generate multiple search variations

High Latency

  • Use batch operations: Process multiple documents at once
  • Cache embeddings: Store embeddings with documents
  • Optimize topK: Request fewer results
  • Use production vector stores: Pinecone, Qdrant, or Weaviate
  • Enable connection pooling: For self-hosted databases

Memory System

The RAG package includes a powerful memory system for building context-aware AI applications. See the Memory System Guide for complete documentation.

Quick Example

import {
  RAGPipelineWithMemory,
  MemoryManager,
  HybridMemory,
  BufferMemory,
  VectorMemory,
} from '@hazeljs/rag';

// Setup memory
const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);

const memoryManager = new MemoryManager(hybridMemory, {
  maxConversationLength: 20,
  summarizeAfter: 50,
  entityExtraction: true,
});

// Create RAG with memory
const rag = new RAGPipelineWithMemory(
  { vectorStore, embeddingProvider: embeddings },
  memoryManager,
  llmFunction
);

// Query with conversation context
const response = await rag.queryWithMemory(
  'What did we discuss about pricing?',
  'session-123',
  'user-456'
);

console.log(response.answer);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);

Memory Features

  • Conversation Memory: Track multi-turn conversations with auto-summarization
  • Entity Memory: Remember people, companies, and relationships
  • Fact Storage: Store and recall facts semantically
  • Working Memory: Temporary context for current tasks
  • Hybrid Storage: Fast buffer + persistent vector storage
  • Semantic Search: Find relevant memories using embeddings

Learn more in the Memory System Guide.

What's Next?

Recipes

// File: src/knowledge/knowledge.service.ts
import { Service } from '@hazeljs/core';
import { RAGPipeline, FileLoader } from '@hazeljs/rag';

@Service()
export class KnowledgeService {
  constructor(private readonly rag: RAGPipeline) {}

  async ingestFolder(folderPath: string) {
    const loader = new FileLoader({ directory: folderPath, extensions: ['.md', '.txt', '.pdf'] });
    const documents = await loader.load();
    await this.rag.ingest(documents);
    return { ingested: documents.length };
  }

  async search(query: string) {
    const results = await this.rag.search(query, { topK: 5 });
    return results.map(r => ({
      content: r.content,
      score: r.score,
      source: r.metadata.source,
    }));
  }
}

Recipe: RAG-Powered Q&A Endpoint

// File: src/qa/qa.controller.ts
import { Controller, Post, Body } from '@hazeljs/core';
import { RAGPipeline } from '@hazeljs/rag';
import { AIEnhancedService } from '@hazeljs/ai';

@Controller('qa')
export class QAController {
  constructor(
    private readonly rag: RAGPipeline,
    private readonly ai: AIEnhancedService,
  ) {}

  @Post()
  async ask(@Body('question') question: string) {
    const context = await this.rag.search(question, { topK: 5 });
    const contextText = context.map(r => r.content).join('\n\n');

    const answer = await this.ai
      .chat(question)
      .system(`Answer based on the following context:\n\n${contextText}`)
      .model('gpt-4-turbo-preview')
      .text();

    return { answer, sources: context.map(r => r.metadata.source) };
  }
}

Recipe: Hybrid Search with Reranking

// File: src/search/search.service.ts
import { Service } from '@hazeljs/core';
import { RAGPipeline } from '@hazeljs/rag';

@Service()
export class SearchService {
  constructor(private readonly rag: RAGPipeline) {}

  async hybridSearch(query: string) {
    const results = await this.rag.search(query, {
      topK: 10,
      strategy: 'hybrid',
      semanticWeight: 0.7,
      keywordWeight: 0.3,
      rerank: true,
      rerankTopK: 5,
    });

    return results;
  }
}

API Reference

For complete API documentation, see the RAG API Reference.