RAG Package

npm downloads

The @hazeljs/rag package provides a comprehensive Retrieval-Augmented Generation (RAG) implementation with document loaders, knowledge graph retrieval (GraphRAG), memory management, and semantic search — everything you need to build production-grade AI knowledge bases.

Purpose

Building RAG applications requires integrating vector databases, managing embeddings, loading documents from diverse sources, implementing search strategies, and maintaining conversation context. The @hazeljs/rag package solves all of this in one place:

  • 11 Document Loaders: TXT, Markdown, JSON, CSV, HTML, PDF, DOCX, web scraping, YouTube transcripts, GitHub repos, and inline text — all with a unified BaseDocumentLoader API
  • GraphRAG: Knowledge graph-based retrieval that extracts entities and relationships, detects communities, and enables entity-centric (local) and thematic (global) search that outperforms flat cosine similarity
  • 5 Vector Store Implementations: Memory, Pinecone, Qdrant, Weaviate, and ChromaDB with a unified interface
  • Memory System: Conversation tracking, entity memory, fact storage, and working memory for context-aware AI
  • Multiple Embedding Providers: OpenAI and Cohere embeddings with easy extensibility
  • Advanced Retrieval Strategies: Hybrid search (vector + BM25), multi-query retrieval, and semantic search
  • Intelligent Text Splitting: Multiple chunking strategies for optimal retrieval
  • RAG + Memory Integration: Combine document retrieval with conversation history for enhanced context
  • Decorator-Based API: @Embeddable, @SemanticSearch, @HybridSearch for declarative RAG
  • Production-Ready: Battle-tested patterns with proper error handling and TypeScript support

Architecture

graph TD
  A["Documents"] --> B["Text Splitter"]
  B --> C["Chunks"]
  C --> D["Embedding Provider<br/>(OpenAI, Cohere)"]
  D --> E["Vector Embeddings"]
  E --> F["Vector Store<br/>(Memory, Pinecone, Qdrant, etc.)"]
  G["User Query"] --> H["Embedding Provider"]
  H --> I["Query Vector"]
  I --> J["Retrieval Strategy<br/>(Semantic, Hybrid, Multi-Query)"]
  J --> F
  F --> K["Ranked Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style C fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style D fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style E fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style F fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff

Key Components

  • RAG Pipeline: Orchestrates document indexing, query processing, and result retrieval
  • Vector Stores: Pluggable storage backends for embeddings and documents
  • Embedding Providers: Generate vector embeddings from text
  • Retrieval Strategies: Advanced search algorithms (hybrid, multi-query, BM25)
  • Text Splitters: Intelligent document chunking for optimal retrieval
  • Decorators: @Embeddable, @SemanticSearch, @HybridSearch for declarative RAG

Advantages

Vector Store Flexibility

Start with in-memory storage for development, then seamlessly switch to Pinecone, Qdrant, Weaviate, or ChromaDB for production—all with the same API.

Advanced Retrieval

Built-in support for hybrid search (combining vector and keyword search), multi-query retrieval (generating multiple search queries), and BM25 keyword ranking.

Developer Experience

Decorator-based API means you can add RAG capabilities with a single decorator. No need to manage vector stores, embeddings, or search logic manually.

Production Ready

Proper error handling, TypeScript support, connection pooling, and battle-tested patterns make it ready for production use.

Extensible

Easy to add custom vector stores, embedding providers, or retrieval strategies by implementing simple interfaces.

Installation

# Core RAG package
npm install @hazeljs/rag

# Peer dependencies (choose based on your needs)
npm install openai  # For OpenAI embeddings and GraphRAG LLM

# Optional: Vector store clients (install only what you need)
npm install @pinecone-database/pinecone  # For Pinecone
npm install @qdrant/js-client-rest       # For Qdrant
npm install weaviate-ts-client           # For Weaviate
npm install chromadb                     # For ChromaDB

Optional Document Loader Dependencies:

# For Cohere embeddings
npm install cohere-ai

# For PDF loading (PdfLoader)
npm install pdf-parse

# For Word document loading (DocxLoader)
npm install mammoth

# For CSS-selector web scraping (WebLoader / HtmlFileLoader)
npm install cheerio

Quick Start

Basic RAG Pipeline

The simplest way to get started with RAG:

import { 
  RAGPipeline, 
  OpenAIEmbeddings, 
  MemoryVectorStore 
} from '@hazeljs/rag';

// Setup embeddings provider
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536,
});

// Create vector store
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

// Create RAG pipeline
const rag = new RAGPipeline({
  vectorStore,
  embeddingProvider: embeddings,
  topK: 5, // Return top 5 results
});

await rag.initialize();

// Index documents
await rag.addDocuments([
  {
    content: 'HazelJS is a modern TypeScript framework for building scalable applications.',
    metadata: { category: 'framework', source: 'docs' },
  },
  {
    content: 'The RAG package provides semantic search and vector database integration.',
    metadata: { category: 'rag', source: 'docs' },
  },
]);

// Query with semantic search
const results = await rag.search('What is HazelJS?', { topK: 3 });

console.log('Search Results:');
results.forEach((result, index) => {
  console.log(`${index + 1}. ${result.content}`);
  console.log(`   Score: ${result.score}`);
  console.log(`   Metadata:`, result.metadata);
});

Document Loaders

Document loaders are the entry point of every RAG pipeline. They read data from any source and return a standardised Document[] array ready for chunking and indexing. Every real-world application needs them immediately — @hazeljs/rag ships 11 built-in loaders covering every common source.

Loader overview

LoaderSourceExtra install?
TextFileLoader.txt files
MarkdownFileLoader.md / .mdx with heading splits and YAML front-matter
JSONFileLoader.json arrays or objects with textKey / jsonPointer extraction
CSVFileLoader.csv rows mapped to documents with configurable columns
HtmlFileLoader.html tag stripping; CSS selectors via cheeriooptional cheerio
DirectoryLoaderRecursive directory walk, auto-detects loader by extension
PdfLoaderPDFs via pdf-parse; split by page or as one documentnpm i pdf-parse
DocxLoaderWord documents via mammoth; plain text or HTML outputnpm i mammoth
WebLoaderHTTP page scraping; CSS selectors via cheerio; retry/timeoutoptional cheerio
YouTubeTranscriptLoaderYouTube transcript download (no API key); segment by duration
GitHubLoaderGitHub REST API; filter by directory, extension, maxFiles

File loaders

import {
  TextFileLoader,
  MarkdownFileLoader,
  JSONFileLoader,
  CSVFileLoader,
  HtmlFileLoader,
} from '@hazeljs/rag';

// Plain text — one document per file
const textDocs = await new TextFileLoader({
  filePath: './docs/notes.txt',
}).load();

// Markdown — split into one document per heading section
const mdDocs = await new MarkdownFileLoader({
  filePath: './docs/guide.md',
  splitByHeading: true,        // creates one Document per H2/H3 section
  parseYamlFrontMatter: true,  // front-matter fields become metadata
}).load();
// mdDocs[0].metadata.heading === 'Installation'

// JSON — extract a specific field as the document content
const jsonDocs = await new JSONFileLoader({
  filePath: './data/articles.json',
  textKey: 'body',             // use 'body' field as content
  // jsonPointer: '/items',    // navigate nested JSON with a JSON Pointer
}).load();

// CSV — map rows to documents; choose which columns become content vs metadata
const csvDocs = await new CSVFileLoader({
  filePath: './data/faqs.csv',
  contentColumns: ['question', 'answer'],
  metadataColumns: ['category'],
}).load();

// HTML — strips all tags, extracts title
const htmlDocs = await new HtmlFileLoader({
  filePath: './docs/index.html',
  selector: 'main',            // optional: only extract content inside <main>
}).load();

DirectoryLoader — bulk ingest

DirectoryLoader walks a directory recursively and automatically delegates each file to the right typed loader. This is the fastest way to ingest a knowledge base from disk:

import { DirectoryLoader } from '@hazeljs/rag';

const docs = await new DirectoryLoader({
  dirPath: './knowledge-base',
  recursive: true,
  // extensions: ['.md', '.txt'],   // filter to specific types
  // exclude: ['**/node_modules/**'],
}).load();

console.log(`Loaded ${docs.length} documents from ${[...new Set(docs.map(d => d.metadata?.source))].length} files`);

PDF and Word documents

import { PdfLoader, DocxLoader } from '@hazeljs/rag';

// PDF — one document per page or the whole file
const pdfDocs = await new PdfLoader({
  filePath: './reports/annual-report.pdf',
  splitByPage: true,   // each page becomes its own Document
}).load();

// Word document
const wordDocs = await new DocxLoader({
  filePath: './contracts/agreement.docx',
  outputFormat: 'text',  // 'text' (default) or 'html'
}).load();

WebLoader — scrape any URL

import { WebLoader } from '@hazeljs/rag';

// Single URL
const docs = await new WebLoader({
  urls: ['https://hazeljs.com/docs'],
  timeout: 10_000,
  maxRetries: 3,
  // selector: 'article',   // optional: CSS selector (requires cheerio)
}).load();

// Multiple URLs in one call
const batchDocs = await new WebLoader({
  urls: [
    'https://hazeljs.com/docs/installation',
    'https://hazeljs.com/blog/graphrag',
  ],
}).load();

YouTubeTranscriptLoader — no API key needed

import { YouTubeTranscriptLoader } from '@hazeljs/rag';

// Works with full URL or just the video ID
const transcriptDocs = await new YouTubeTranscriptLoader({
  videoUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
  segmentDuration: 60,   // group transcript into 60-second chunks
}).load();

// Each doc has metadata: { videoId, startTime, endTime, source }

GitHubLoader — index entire repositories

import { GitHubLoader } from '@hazeljs/rag';

const repoDocs = await new GitHubLoader({
  owner: 'hazeljs',
  repo: 'hazel',
  ref: 'main',                  // branch or tag
  directory: 'docs',            // only load this sub-directory
  extensions: ['.md', '.mdx'],  // only Markdown files
  maxFiles: 100,
  token: process.env.GITHUB_TOKEN, // optional; avoids 60 req/hr rate limit
}).load();

Custom loaders with @Loader and DocumentLoaderRegistry

Extend BaseDocumentLoader to add any data source. The @Loader decorator registers metadata for auto-detection:

import {
  BaseDocumentLoader,
  Loader,
  DocumentLoaderRegistry,
} from '@hazeljs/rag';

@Loader({
  name: 'NotionLoader',
  description: 'Loads pages from a Notion database',
  extensions: [],
  mimeTypes: ['application/vnd.notion'],
})
export class NotionLoader extends BaseDocumentLoader {
  constructor(private readonly databaseId: string) {
    super();
  }

  async load() {
    const pages = await fetchNotionDatabase(this.databaseId);
    return pages.map((page) =>
      this.createDocument(page.content, {
        source: `notion:${this.databaseId}/${page.id}`,
        title: page.title,
        lastEdited: page.lastEditedTime,
      }),
    );
  }
}

// Register once at startup — then DirectoryLoader and the registry can use it
DocumentLoaderRegistry.register(
  NotionLoader,
  (databaseId: string) => new NotionLoader(databaseId),
);

Full ingest pipeline

Putting it all together with the RAG pipeline:

import {
  DirectoryLoader,
  GitHubLoader,
  WebLoader,
  RAGPipeline,
  OpenAIEmbeddings,
  MemoryVectorStore,
  RecursiveTextSplitter,
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({ apiKey: process.env.OPENAI_API_KEY });
const vectorStore = new MemoryVectorStore(embeddings);
const splitter = new RecursiveTextSplitter({ chunkSize: 800, chunkOverlap: 150 });

const pipeline = new RAGPipeline({ vectorStore, embeddingProvider: embeddings, textSplitter: splitter });
await pipeline.initialize();

// Load from multiple sources
const [localDocs, githubDocs, webDocs] = await Promise.all([
  new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load(),
  new GitHubLoader({ owner: 'hazeljs', repo: 'hazel', directory: 'docs', extensions: ['.md'] }).load(),
  new WebLoader({ urls: ['https://hazeljs.com/docs'] }).load(),
]);

// Index everything at once
const ids = await pipeline.addDocuments([...localDocs, ...githubDocs, ...webDocs]);
console.log(`Indexed ${ids.length} chunks`);

GraphRAG

GraphRAG extends traditional vector search by building a knowledge graph of entities and relationships extracted from your documents. Instead of searching raw text chunks by cosine similarity, it retrieves structured facts and cross-document themes — answering questions that flat vector search cannot.

See the full GraphRAG Guide for an in-depth walkthrough.

Why GraphRAG?

Traditional RAG retrieves the K most similar text chunks. This works well for narrow questions but fails for:

  • Cross-document reasoning — "How do all the components in the system relate to each other?"
  • Thematic questions — "What are the main architectural layers of this codebase?"
  • Entity-relationship queries — "What does the AgentGraph depend on?"

GraphRAG solves this with two complementary retrieval modes:

ModeHow it worksBest for
LocalFinds entities matching the query, traverses K hops in the knowledge graph, assembles entity + relationship contextSpecific "what is / how does" questions
GlobalRanks LLM-generated community reports by relevance; assembles thematic summariesBroad "what are the main themes / architecture" questions
HybridRuns both in parallel, merges contexts, single LLM synthesis callBest default — covers both dimensions

Architecture

graph TD
  A["Documents"] --> B["Text Chunks"]
  B --> C["Entity Extractor<br/>(LLM)"]
  C --> D["Knowledge Graph<br/>(GraphStore)"]
  D --> E["Community Detector<br/>(Label Propagation)"]
  E --> F["Community Summarizer<br/>(LLM Reports)"]

  G["User Query"] --> H{"Search Mode"}
  H -->|"local"| I["Seed Entity Lookup"]
  H -->|"global"| J["Community Report Ranking"]
  H -->|"hybrid"| K["Both in Parallel"]

  I --> L["BFS Graph Traversal<br/>(K hops)"]
  L --> M["Entity + Relationship Context"]
  J --> N["Top-K Report Summaries"]
  K --> O["Merged Context"]

  M --> P["LLM Synthesis"]
  N --> P
  O --> P
  P --> Q["Answer + Sources"]

  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style F fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style Q fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

Building the knowledge graph

import OpenAI from 'openai';
import {
  GraphRAGPipeline,
  DirectoryLoader,
} from '@hazeljs/rag';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Create the pipeline — provide an LLM function for extraction and synthesis
const graphRag = new GraphRAGPipeline({
  llm: async (prompt) => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      temperature: 0,
      messages: [{ role: 'user', content: prompt }],
    });
    return res.choices[0].message.content ?? '';
  },
  extractionChunkSize: 2000,      // max chars per LLM extraction call
  generateCommunityReports: true, // produce LLM summaries per community cluster
  maxCommunitySize: 15,           // split communities larger than this
  localSearchDepth: 2,            // BFS hops for local search
  localSearchTopK: 5,             // seed entities per query
  globalSearchTopK: 5,            // community reports used in global search
});

// Load documents from any source
const docs = await new DirectoryLoader({ dirPath: './knowledge-base', recursive: true }).load();

// Build extracts entities, builds the graph, detects communities, and writes reports
const stats = await graphRag.build(docs);
console.log(stats);
// {
//   documentsProcessed: 12,
//   entitiesExtracted: 47,
//   relationshipsExtracted: 63,
//   communitiesDetected: 8,
//   communityReportsGenerated: 8,
//   duration: 18400,
// }

Local search — entity-centric

Best for specific, factual questions about named concepts, technologies, or processes:

const result = await graphRag.search('How does HazelJS dependency injection work?', {
  mode: 'local',
  depth: 2,   // traverse up to 2 hops from seed entities
  topK: 5,    // start from 5 seed entities
});

console.log(result.answer);
// "HazelJS uses constructor injection. When the IoC container resolves
//  a @Service(), it reads TypeScript metadata to identify constructor
//  parameters and injects resolved instances automatically..."

console.log(result.entities.map(e => `${e.name} [${e.type}]`));
// ['Dependency Injection [CONCEPT]', 'IoC Container [TECHNOLOGY]',
//  '@Service [FEATURE]', 'HazelJS [TECHNOLOGY]', ...]

console.log(result.relationships.map(r => `${r.type}: ${r.description}`));
// ['USES: HazelJS uses constructor injection pattern', ...]

Global search — community reports

Best for broad questions about themes, architecture, or the overall scope of a knowledge base:

const result = await graphRag.search(
  'What are the main architectural layers of the HazelJS framework?',
  {
    mode: 'global',
    topK: 5,  // include top 5 community reports by relevance
  },
);

console.log(result.communities[0]);
// {
//   communityId: 'community_0',
//   title: 'HazelJS Core Infrastructure Layer',
//   summary: 'This community represents the foundational layer of HazelJS...',
//   findings: ['HazelJS Core provides HTTP and DI foundation', ...],
//   rating: 9,
// }

Hybrid search — best default

Runs local and global in parallel and merges their contexts before a single LLM synthesis call:

const result = await graphRag.search(
  'What vector stores does @hazeljs/rag support and how do I swap them?',
  {
    mode: 'hybrid',      // default when mode is omitted
    includeGraph: true,  // include entities + relationships in result
    includeCommunities: true,
  },
);

console.log(`${result.mode} search in ${result.duration}ms`);
console.log(`Entities found: ${result.entities.length}`);
console.log(`Communities used: ${result.communities.length}`);

Entity and relationship types

The LLM extractor maps every concept to one of these canonical types, making the graph consistent and queryable:

Entity types: CONCEPT · TECHNOLOGY · PERSON · ORGANIZATION · PROCESS · FEATURE · EVENT · LOCATION · OTHER

Relationship types: USES · IMPLEMENTS · CREATED_BY · PART_OF · DEPENDS_ON · RELATED_TO · EXTENDS · CONFIGURES · TRIGGERS · PRODUCES · REPLACES · OTHER

Incremental updates

Add new documents to an existing graph without rebuilding from scratch:

// Add a new batch of documents to the existing graph
const updateStats = await graphRag.addDocuments(newDocs);
// Graph re-runs community detection and regenerates reports after each batch

Inspect the graph

The full knowledge graph is available for visualisation (D3.js, Cytoscape.js, etc.):

const graph = graphRag.getGraph();

// Entities
console.log([...graph.entities.values()].slice(0, 3));
// [{ id, name, type, description, sourceDocIds }, ...]

// Relationships
console.log([...graph.relationships.values()].slice(0, 3));
// [{ id, sourceId, targetId, type, description, weight }, ...]

// Community reports
console.log([...graph.communityReports.values()].map(r => r.title));
// ['HazelJS Core DI System', 'RAG Pipeline & Vector Stores', ...]

// Statistics
const stats = graphRag.getStats();
console.log(stats.entityTypeBreakdown);
// { TECHNOLOGY: 14, CONCEPT: 12, FEATURE: 9, PROCESS: 7, ... }
console.log(stats.topEntities.slice(0, 3));
// [{ name: 'HazelJS', connections: 12 }, ...]

GraphRAG vs traditional RAG

Traditional RAGGraphRAG
StorageFlat vector indexKnowledge graph + vector index
Retrieval unitText chunkEntity + relationships + community
Cross-document reasoningLimitedNative
Broad thematic questionsPoorExcellent (community reports)
Specific entity questionsGoodExcellent (BFS traversal)
Setup costLowMedium (LLM extraction pass)
Token cost per queryLowMedium
Best use caseQ&A over focused docsMulti-document knowledge bases

Vector Stores

The RAG package supports 5 vector store implementations with a unified interface.

Memory Vector Store (Development)

In-memory storage with no external dependencies. Perfect for development and testing.

Advantages:

  • Zero setup required
  • Extremely fast
  • No external dependencies
  • Great for testing and CI/CD

Limitations:

  • Data lost on restart
  • Limited to available memory
  • Not suitable for production
import { MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

// Use it
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });

Pinecone Vector Store (Production, Serverless)

Fully managed, serverless vector database with automatic scaling.

Advantages:

  • Fully managed (no infrastructure)
  • Auto-scaling
  • Global distribution
  • High performance
  • Excellent for serverless deployments

Limitations:

  • Paid service (free tier available)
  • Network latency for self-hosted alternatives
import { PineconeVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new PineconeVectorStore(embeddings, {
  apiKey: process.env.PINECONE_API_KEY,
  environment: process.env.PINECONE_ENVIRONMENT,
  indexName: 'my-knowledge-base',
});

await vectorStore.initialize();

// Same API as Memory store
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });

Setup:

  • Sign up at pinecone.io
  • Create an index with dimension matching your embeddings (1536 for OpenAI text-embedding-3-small)
  • Get your API key and environment from the dashboard

Qdrant Vector Store (High-Performance, Self-Hosted)

Rust-based vector database optimized for speed and efficiency.

Advantages:

  • Extremely fast (Rust-based)
  • Advanced filtering capabilities
  • Self-hosted (full control)
  • Open-source
  • Cost-effective for large datasets

Limitations:

  • Requires infrastructure management
  • Setup complexity
import { QdrantVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new QdrantVectorStore(embeddings, {
  url: process.env.QDRANT_URL || 'http://localhost:6333',
  collectionName: 'my-knowledge-base',
});

await vectorStore.initialize();

Setup with Docker:

docker run -p 6333:6333 qdrant/qdrant

Weaviate Vector Store (GraphQL, Flexible)

Open-source vector database with GraphQL API and advanced features.

Advantages:

  • GraphQL API
  • Flexible schema
  • Built-in vectorization
  • Hybrid search support
  • Multi-tenancy

Limitations:

  • Requires infrastructure
  • Learning curve for GraphQL
import { WeaviateVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new WeaviateVectorStore(embeddings, {
  host: process.env.WEAVIATE_HOST || 'http://localhost:8080',
  className: 'MyKnowledgeBase',
});

await vectorStore.initialize();

Setup with Docker:

docker run -p 8080:8080 semitechnologies/weaviate:latest

ChromaDB Vector Store (Prototyping, Embedded)

Lightweight, embeddable vector database perfect for prototyping.

Advantages:

  • Easy setup
  • Lightweight
  • Can run embedded or as a server
  • Great for prototyping
  • Python and JavaScript support

Limitations:

  • Less mature than alternatives
  • Limited scalability for very large datasets
import { ChromaVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new ChromaVectorStore(embeddings, {
  url: process.env.CHROMA_URL || 'http://localhost:8000',
  collectionName: 'my-knowledge-base',
});

await vectorStore.initialize();

// ChromaDB-specific features
const stats = await vectorStore.getStats();
console.log('Collection size:', stats.count);

const preview = await vectorStore.peek(5);
console.log('First 5 documents:', preview);

Setup with Docker:

docker run -p 8000:8000 chromadb/chroma

Vector Store Comparison

FeatureMemoryPineconeQdrantWeaviateChromaDB
SetupNoneAPI KeyDockerDockerDocker
Persistence
ScalabilityLowHighHighHighMedium
Performance⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
CostFreePaidFree (OSS)Free (OSS)Free (OSS)
Best ForDev/TestProductionHigh-perfGraphQLPrototyping
Metadata Filtering
Hybrid Search
Multi-tenancy

Embedding Providers

Embedding providers convert text into vector representations for semantic search.

OpenAI Embeddings

State-of-the-art embeddings from OpenAI with multiple model options.

Models:

  • text-embedding-3-small: 1536 dimensions, fast and cost-effective
  • text-embedding-3-large: 3072 dimensions, highest quality
  • text-embedding-ada-002: Legacy model, 1536 dimensions
import { OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536, // Optional: reduce dimensions for faster search
});

// Embed single text
const vector = await embeddings.embed('Hello world');
console.log('Vector dimensions:', vector.length);

// Embed multiple texts (batch)
const vectors = await embeddings.embedBatch([
  'First document',
  'Second document',
  'Third document',
]);

Cohere Embeddings

Multilingual embeddings from Cohere with excellent performance.

Models:

  • embed-english-v3.0: English-only, high quality
  • embed-multilingual-v3.0: 100+ languages
  • embed-english-light-v3.0: Faster, smaller model
import { CohereEmbeddings } from '@hazeljs/rag';

const embeddings = new CohereEmbeddings({
  apiKey: process.env.COHERE_API_KEY,
  model: 'embed-english-v3.0',
  inputType: 'search_document', // or 'search_query'
});

const vector = await embeddings.embed('Hello world');

Retrieval Strategies

Advanced search strategies for better results.

Hybrid Search

Combines vector similarity search with BM25 keyword search for best results.

graph LR
  A["Query"] --> B["Vector Search<br/>(Semantic)"]
  A --> C["BM25 Search<br/>(Keyword)"]
  B --> D["Score Fusion"]
  C --> D
  D --> E["Ranked Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style E fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

How it works:

  • Performs vector similarity search (semantic understanding)
  • Performs BM25 keyword search (exact term matching)
  • Normalizes scores from both methods
  • Combines scores with configurable weights
  • Returns re-ranked results
import { 
  HybridSearchRetrieval, 
  MemoryVectorStore, 
  OpenAIEmbeddings 
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

const hybridSearch = new HybridSearchRetrieval(vectorStore, {
  vectorWeight: 0.7,  // 70% weight to semantic search
  keywordWeight: 0.3, // 30% weight to keyword search
  topK: 10,
});

// Add documents
await vectorStore.addDocuments(documents);

// Search with hybrid strategy
const results = await hybridSearch.search('machine learning algorithms', {
  topK: 5,
});

Multi-Query Retrieval

Generates multiple query variations using an LLM to improve recall.

graph TD
  A["Original Query"] --> B["LLM Query Generator"]
  B --> C["Query Variation 1"]
  B --> D["Query Variation 2"]
  B --> E["Query Variation 3"]
  C --> F["Vector Search"]
  D --> F
  E --> F
  F --> G["Deduplicate & Rank"]
  G --> H["Final Results"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style D fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style E fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style G fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff
  style H fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff

How it works:

  • Takes user's original question
  • Uses LLM to generate multiple variations
  • Searches with each variation
  • Deduplicates results
  • Re-ranks by frequency and average score
import { 
  MultiQueryRetrieval, 
  MemoryVectorStore, 
  OpenAIEmbeddings 
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

const multiQuery = new MultiQueryRetrieval(vectorStore, {
  llmApiKey: process.env.OPENAI_API_KEY,
  numQueries: 3, // Generate 3 query variations
  topK: 10,
});

// Add documents
await vectorStore.addDocuments(documents);

// Search with multiple query variations
const results = await multiQuery.search('How do I deploy my app?', {
  topK: 5,
});

Text Splitters

Intelligent document chunking for optimal retrieval.

Recursive Character Text Splitter

Splits text recursively by trying different separators (paragraphs, sentences, words).

import { RecursiveCharacterTextSplitter } from '@hazeljs/rag';

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,      // Target chunk size in characters
  chunkOverlap: 200,    // Overlap between chunks for context
  separators: ['\n\n', '\n', '. ', ' '], // Try these in order
});

const chunks = await splitter.splitText(longDocument);

console.log(`Split into ${chunks.length} chunks`);
chunks.forEach((chunk, i) => {
  console.log(`Chunk ${i + 1}: ${chunk.substring(0, 50)}...`);
});

Character Text Splitter

Simple character-based splitting with overlap.

import { CharacterTextSplitter } from '@hazeljs/rag';

const splitter = new CharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 50,
  separator: '\n',
});

const chunks = await splitter.splitText(document);

Token Text Splitter

Splits by token count (useful for LLM context windows).

import { TokenTextSplitter } from '@hazeljs/rag';

const splitter = new TokenTextSplitter({
  chunkSize: 512,      // Max tokens per chunk
  chunkOverlap: 50,    // Overlap in tokens
  encodingName: 'cl100k_base', // OpenAI encoding
});

const chunks = await splitter.splitText(document);

Decorators

Declarative RAG with decorators.

@Embeddable

Mark a class as embeddable for automatic vector storage.

import { Embeddable, Embedded } from '@hazeljs/rag';

@Embeddable({
  vectorStore: 'memory',
  embeddingProvider: 'openai',
})
class Article {
  @Embedded()
  title: string;

  @Embedded()
  content: string;

  metadata: {
    author: string;
    date: Date;
  };
}

@SemanticSearch

Add semantic search to a method.

import { Controller, Get } from '@hazeljs/common';
import { SemanticSearch } from '@hazeljs/rag';

@Controller('search')
class SearchController {
  @Get()
  @SemanticSearch({
    vectorStore: 'pinecone',
    topK: 5,
  })
  async search(@Query('q') query: string) {
    // Results automatically injected
    return { query, results: this.searchResults };
  }
}

@HybridSearch

Add hybrid search (vector + keyword) to a method.

import { Controller, Get } from '@hazeljs/common';
import { HybridSearch } from '@hazeljs/rag';

@Controller('search')
class SearchController {
  @Get('hybrid')
  @HybridSearch({
    vectorStore: 'qdrant',
    vectorWeight: 0.7,
    keywordWeight: 0.3,
    topK: 10,
  })
  async hybridSearch(@Query('q') query: string) {
    return { query, results: this.searchResults };
  }
}

Best Practices

Choose the Right Vector Store

  • Development: Use MemoryVectorStore for fast iteration
  • Production (Serverless): Use PineconeVectorStore for zero infrastructure
  • Production (Self-Hosted): Use QdrantVectorStore for performance and cost
  • Prototyping: Use ChromaVectorStore for quick setup

Optimize Chunk Size

// For Q&A: Smaller chunks (200-500 chars)
const qaChunks = new RecursiveCharacterTextSplitter({
  chunkSize: 300,
  chunkOverlap: 50,
});

// For summarization: Larger chunks (1000-2000 chars)
const summaryChunks = new RecursiveCharacterTextSplitter({
  chunkSize: 1500,
  chunkOverlap: 200,
});

Use Metadata Filtering

// Add metadata when indexing
await vectorStore.addDocuments([
  {
    content: 'Document content',
    metadata: {
      category: 'technical',
      date: '2024-01-01',
      author: 'John Doe',
    },
  },
]);

// Filter during search
const results = await vectorStore.search('query', {
  topK: 5,
  filter: {
    category: 'technical',
    date: { $gte: '2024-01-01' },
  },
});

Implement Caching

import { CacheService } from '@hazeljs/cache';

class RAGService {
  constructor(
    private vectorStore: VectorStore,
    private cache: CacheService,
  ) {}

  async search(query: string) {
    const cacheKey = `search:${query}`;
    
    // Check cache first
    const cached = await this.cache.get(cacheKey);
    if (cached) return cached;

    // Perform search
    const results = await this.vectorStore.search(query);

    // Cache results
    await this.cache.set(cacheKey, results, 3600); // 1 hour

    return results;
  }
}

Monitor Performance

async function searchWithMetrics(query: string) {
  const start = Date.now();
  
  try {
    const results = await vectorStore.search(query);
    const duration = Date.now() - start;
    
    console.log(`Search completed in ${duration}ms`);
    console.log(`Found ${results.length} results`);
    
    return results;
  } catch (error) {
    console.error('Search failed:', error);
    throw error;
  }
}

Troubleshooting

Connection Errors

// Add retry logic
async function connectWithRetry(vectorStore: VectorStore, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await vectorStore.initialize();
      console.log('Connected successfully');
      return;
    } catch (error) {
      console.log(`Connection attempt ${i + 1} failed`);
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}

Dimension Mismatch

// Ensure embedding dimensions match vector store configuration
// OpenAI text-embedding-3-small = 1536 dimensions
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536, // Must match index
});

Docker Setup for Self-Hosted Stores

# Qdrant
docker run -p 6333:6333 qdrant/qdrant

# Weaviate
docker run -p 8080:8080 semitechnologies/weaviate:latest

# ChromaDB
docker run -p 8000:8000 chromadb/chroma

Low Search Quality

  • Increase chunk overlap: More context between chunks
  • Adjust chunk size: Smaller chunks for precise retrieval
  • Use hybrid search: Combine semantic and keyword search
  • Add metadata filtering: Narrow down search scope
  • Try multi-query retrieval: Generate multiple search variations

High Latency

  • Use batch operations: Process multiple documents at once
  • Cache embeddings: Store embeddings with documents
  • Optimize topK: Request fewer results
  • Use production vector stores: Pinecone, Qdrant, or Weaviate
  • Enable connection pooling: For self-hosted databases

Memory System

The RAG package includes a powerful memory system for building context-aware AI applications. See the Memory System Guide for complete documentation.

Quick Example

import {
  RAGPipelineWithMemory,
  MemoryManager,
  HybridMemory,
  BufferMemory,
  VectorMemory,
} from '@hazeljs/rag';

// Setup memory
const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);

const memoryManager = new MemoryManager(hybridMemory, {
  maxConversationLength: 20,
  summarizeAfter: 50,
  entityExtraction: true,
});

// Create RAG with memory
const rag = new RAGPipelineWithMemory(
  { vectorStore, embeddingProvider: embeddings },
  memoryManager,
  llmFunction
);

// Query with conversation context
const response = await rag.queryWithMemory(
  'What did we discuss about pricing?',
  'session-123',
  'user-456'
);

console.log(response.answer);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);

Memory Features

  • Conversation Memory: Track multi-turn conversations with auto-summarization
  • Entity Memory: Remember people, companies, and relationships
  • Fact Storage: Store and recall facts semantically
  • Working Memory: Temporary context for current tasks
  • Hybrid Storage: Fast buffer + persistent vector storage
  • Semantic Search: Find relevant memories using embeddings

Learn more in the Memory System Guide.

What's Next?

API Reference

For complete API documentation, see the RAG API Reference.