HazelJS Memory System

The HazelJS Memory System provides persistent context and conversation management for AI applications: conversation history, entities, facts, and working memory. Memory lives in @hazeljs/rag (BufferMemory, VectorMemory, HybridMemory) and can optionally be backed by @hazeljs/memory for a unified, multi-store model shared by RAG and agents.

Quick Reference

Purpose: The HazelJS Memory System manages conversation history, entity tracking, facts, and working memory for AI applications. It provides persistent context across multi-turn interactions for both RAG pipelines and AI agents.
When to use: Use the HazelJS Memory System when AI agents or RAG pipelines need to remember conversation history across turns, track entities (people, companies), store facts, or maintain working memory for complex multi-step reasoning.
Key concepts: MemoryManager, 5 memory types (Conversation, Entity, Fact, Event, Working), 3 storage strategies (BufferMemory, VectorMemory, HybridMemory), semantic search over memories, auto-summarization, importance scoring, memory decay, shared memory between RAG and agents, @hazeljs/memory backend adapter.
Inputs: User messages, assistant responses, entity data, facts, sessionId for per-session isolation.
Outputs: Retrieved conversation context, relevant memories, entity profiles, fact lists, memory statistics.
Dependencies: @hazeljs/rag (primary), optionally @hazeljs/memory for persistent backends (Prisma, Redis).
Common patterns: Create MemoryManager with storage config → pass to both RagService and AgentRuntime for shared context → use sessionId to isolate conversations → use HybridMemory for best retrieval quality.
Common mistakes: Not sharing the same MemoryManager instance between RAG and agents (leads to fragmented context); using BufferMemory alone for long conversations (loses old context); not setting a sessionId (mixes conversations); forgetting to configure embeddings for VectorMemory.

When to Use: Storage Strategy Decision Guide

Strategy	Speed	Retrieval quality	Use when
`BufferMemory`	Fastest	Recency-only (FIFO)	Short conversations, prototyping, low-latency requirements
`VectorMemory`	Slower (embedding lookups)	Best semantic relevance	Long conversations, fact-heavy workloads, agent reasoning
`HybridMemory`	Medium	Best overall (recent + semantic)	Production applications that need both recent and relevant memories

Overview

The HazelJS Memory System provides:

5 Memory Types: Conversation, Entity, Fact, Event, and Working memory
3 Storage Strategies: BufferMemory (fast), VectorMemory (semantic), HybridMemory (best of both)
Semantic Search: Find relevant memories using embeddings
Auto-Summarization: Compress old conversations automatically
Entity Tracking: Remember people, companies, and relationships
Importance Scoring: Prioritize relevant information
RAG Integration: Combine document retrieval with conversation context
Shared memory (RAG + Agent): One MemoryManager in-process, passed to both RAG and every AgentRuntime so they share the same conversation and context (see Shared memory: RAG + Agent)
Optional @hazeljs/memory backend: Use the Memory package (in-memory, Prisma, Redis) and the adapter from @hazeljs/rag/memory-hazel to back RAG/agent memory with one store

Architecture

graph TD
  A["User Message"] --> B["Memory Manager"]
  B --> C["Conversation Memory"]
  B --> D["Entity Memory"]
  B --> E["Fact Memory"]
  B --> F["Working Memory"]
  
  C --> G["Buffer Store<br/>(Recent)"]
  C --> H["Vector Store<br/>(Long-term)"]
  
  I["Query"] --> J["Memory Search"]
  J --> G
  J --> H
  H --> K["Semantic Search"]
  K --> L["Relevant Memories"]
  
  M["RAG Pipeline"] --> N["Document Retrieval"]
  M --> L
  N --> O["Enhanced Context"]
  L --> O
  O --> P["LLM Response"]
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style M fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style P fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff

Memory Types

Conversation Memory

Track multi-turn conversations with automatic summarization.

import { MemoryManager, BufferMemory } from '@hazeljs/rag';

const memoryStore = new BufferMemory({ maxSize: 100 });
const memoryManager = new MemoryManager(memoryStore, {
  maxConversationLength: 20,
  summarizeAfter: 50,
});

await memoryManager.initialize();

// Add messages
await memoryManager.addMessage(
  { role: 'user', content: 'What is HazelJS?' },
  'session-123'
);

await memoryManager.addMessage(
  { role: 'assistant', content: 'HazelJS is an AI-native framework...' },
  'session-123'
);

// Get history
const history = await memoryManager.getConversationHistory('session-123', 10);

// Summarize
const summary = await memoryManager.summarizeConversation('session-123');

Features:

Sliding window for recent messages
Automatic summarization of old conversations
Token-aware context management
Multi-session support

Entity Memory

Track entities (people, companies, concepts) mentioned in conversations.

// Track an entity
await memoryManager.trackEntity({
  name: 'Alice',
  type: 'person',
  attributes: {
    role: 'engineer',
    company: 'TechCorp',
  },
  relationships: [
    { type: 'works_at', target: 'TechCorp' },
  ],
  firstSeen: new Date(),
  lastSeen: new Date(),
  mentions: 1,
});

// Retrieve entity
const alice = await memoryManager.getEntity('Alice');

// Update entity
await memoryManager.updateEntity('Alice', {
  attributes: { ...alice.attributes, status: 'premium' },
});

// Get all entities
const entities = await memoryManager.getAllEntities('session-123');

Use Cases:

Customer relationship management
Personalized recommendations
Knowledge graph construction
Context-aware responses

Semantic Memory (Facts)

Store and recall facts with semantic understanding.

// Store facts
await memoryManager.storeFact(
  'User prefers dark mode',
  { userId: 'user-123', category: 'preference' }
);

await memoryManager.storeFact(
  'HazelJS supports TypeScript decorators',
  { category: 'framework-feature' }
);

// Recall facts semantically
const facts = await memoryManager.recallFacts('user preferences', {
  topK: 5,
  minScore: 0.7,
});

// Update a fact
await memoryManager.updateFact(factId, 'User prefers light mode');

Features:

Semantic search across facts
Time-based relevance
Conflict detection
Automatic consolidation

Working Memory

Temporary scratchpad for current task context.

// Set context
await memoryManager.setContext('current_task', 'checkout', 'session-123');
await memoryManager.setContext('cart_items', ['item1', 'item2'], 'session-123');

// Get context
const task = await memoryManager.getContext('current_task', 'session-123');
const items = await memoryManager.getContext('cart_items', 'session-123');

// Clear context
await memoryManager.clearContext('session-123');

Use Cases:

Multi-step workflows
State management
Temporary calculations
Task coordination

Storage Strategies

BufferMemory

Fast FIFO in-memory buffer for recent memories.

import { BufferMemory } from '@hazeljs/rag';

const buffer = new BufferMemory({
  maxSize: 100,
  ttl: 3600000, // 1 hour in milliseconds
});

Best For:

Development and testing
Recent conversation history
Low-latency requirements
Temporary context

Advantages:

Extremely fast (in-memory)
Zero setup
No external dependencies
Automatic TTL expiration

Limitations:

Data lost on restart
Limited capacity
No semantic search

VectorMemory

Stores memories as embeddings for semantic search.

import { VectorMemory, MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
});

const vectorStore = new MemoryVectorStore(embeddings);
const vectorMemory = new VectorMemory(vectorStore, embeddings, {
  collectionName: 'memories',
});

Best For:

Long-term memory storage
Semantic search requirements
Production deployments
Large memory volumes

Advantages:

Semantic search
Persistent storage
Scalable
Works with any vector store

Limitations:

Slower than buffer
Requires embeddings
External dependencies

HybridMemory

Combines buffer and vector storage for optimal performance.

import { HybridMemory, BufferMemory, VectorMemory } from '@hazeljs/rag';

const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);

const hybrid = new HybridMemory(buffer, vectorMemory, {
  archiveThreshold: 15, // Archive after 15 messages
});

Best For:

Production applications
Balancing speed and persistence
Large-scale deployments
Best of both worlds

How It Works:

Recent memories stay in fast buffer
Old memories automatically archive to vector store
Searches check both stores
Deduplication ensures consistency

RAG Integration

Combine memory with document retrieval for context-aware responses.

import {
  RAGPipelineWithMemory,
  MemoryManager,
  HybridMemory,
  BufferMemory,
  VectorMemory,
  MemoryVectorStore,
  OpenAIEmbeddings,
} from '@hazeljs/rag';

// Setup memory
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
});

const buffer = new BufferMemory({ maxSize: 20 });
const memoryVectorStore = new MemoryVectorStore(embeddings);
const vectorMemory = new VectorMemory(memoryVectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);

const memoryManager = new MemoryManager(hybridMemory, {
  maxConversationLength: 20,
  summarizeAfter: 50,
  entityExtraction: true,
});

// Setup RAG
const documentVectorStore = new MemoryVectorStore(embeddings);

const rag = new RAGPipelineWithMemory(
  {
    vectorStore: documentVectorStore,
    embeddingProvider: embeddings,
    topK: 5,
  },
  memoryManager,
  llmFunction
);

await rag.initialize();

// Add documents
await rag.addDocuments([
  {
    content: 'HazelJS is a modern TypeScript framework...',
    metadata: { source: 'docs' },
  },
]);

// Query with memory context
const response = await rag.queryWithMemory(
  'What did we discuss about pricing?',
  'session-123',
  'user-456'
);

console.log(response.answer);
console.log('Sources:', response.sources);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);

Enhanced Context

The RAG pipeline with memory combines three sources of context:

Document Retrieval: Relevant documents from knowledge base
Conversation History: Recent messages in the conversation
Relevant Memories: Semantically similar past interactions

// Automatic fact extraction
const response = await rag.queryWithLearning(
  'Tell me about HazelJS features',
  'session-123',
  'user-456'
);
// Facts from response are automatically stored

// Get conversation summary
const summary = await rag.getConversationSummary('session-123');

// Recall specific facts
const facts = await rag.recallFacts('user preferences', 5);

// Memory statistics
const stats = await rag.getMemoryStats('session-123');

Shared memory: RAG + Agent

RAG and agents can share the same memory in one process: create one store and one MemoryManager, then pass that MemoryManager to both RAGPipelineWithMemory and every AgentRuntime. Same sessionId means the same conversation history and context for both.

In-process — No separate memory service, no HTTP. Everything runs in the same Node.js process.
Central provisioning — Create the store and MemoryManager once at app startup (e.g. in a module or bootstrap), then inject or pass the same instance into RAG and all agents.

import { MemoryManager, RAGPipelineWithMemory } from '@hazeljs/rag';
import { AgentRuntime } from '@hazeljs/agent';
import { BufferMemory } from '@hazeljs/rag';

// One store, one MemoryManager
const ragStore = new BufferMemory({ maxSize: 50 });
const memoryManager = new MemoryManager(ragStore, { maxConversationLength: 30 });
await memoryManager.initialize();

// Pass the same MemoryManager to RAG and to every agent
const rag = new RAGPipelineWithMemory(config, memoryManager, llmFunction);
const agentA = new AgentRuntime({ memoryManager, llmProvider: openAIProvider });
const agentB = new AgentRuntime({ memoryManager, llmProvider: openAIProvider });

// Same sessionId => same conversation and context for RAG and agents
const sessionId = 'user-123-session';
await rag.queryWithMemory('What is HazelJS?', sessionId, 'user-123');
await agentA.execute('my-agent', 'What did we just discuss?', { sessionId, userId: 'user-123', enableMemory: true });
// Agent sees the RAG conversation via memoryManager.getConversationHistory(sessionId)

Using @hazeljs/memory as the backend

To back RAG and agent memory with the Memory package (in-memory, Prisma, Redis, or composite), use the HazelMemoryStoreAdapter so RAG's MemoryManager talks to @hazeljs/memory:

Install the optional peer: npm install @hazeljs/memory
Create a store and MemoryService from @hazeljs/memory, wrap it with the adapter from @hazeljs/rag/memory-hazel, then create one MemoryManager and pass it to RAG and all agents.

import { MemoryManager, RAGPipelineWithMemory } from '@hazeljs/rag';
import { createHazelMemoryStoreAdapter } from '@hazeljs/rag/memory-hazel';
import { MemoryService, createDefaultMemoryStore } from '@hazeljs/memory';

const hazelStore = createDefaultMemoryStore();
const memoryService = new MemoryService(hazelStore);
const ragStore = createHazelMemoryStoreAdapter(memoryService);
const memoryManager = new MemoryManager(ragStore);

const rag = new RAGPipelineWithMemory(config, memoryManager, llmFunction);
const agent = new AgentRuntime({ memoryManager, llmProvider });

In-process: RAG, agents, and memory run in the same Node process; no HTTP.
Shared memory: One store and one MemoryManager created at app level and passed to RAG and every AgentRuntime.
For Prisma or other backends, use the appropriate store factory from @hazeljs/memory (or @hazeljs/memory/prisma) before wrapping with createHazelMemoryStoreAdapter.

See the Memory Package for categories, multi-store options, and the full API.

Advanced Features

Memory Search

Search across all memories semantically:

const relevantMemories = await memoryManager.relevantMemories(
  'pricing and discounts',
  {
    sessionId: 'session-123',
    types: [MemoryType.CONVERSATION, MemoryType.FACT],
    topK: 5,
    minScore: 0.7,
  }
);

Importance Scoring

Automatically calculate and use importance scores:

const memoryManager = new MemoryManager(memoryStore, {
  importanceScoring: true, // Enable automatic scoring
});

// Memories with higher importance are retained longer
// Questions and long content get higher scores

Memory Decay

Time-based relevance scoring:

const memoryManager = new MemoryManager(memoryStore, {
  memoryDecay: true,
  decayRate: 0.1, // 10% decay per time unit
});

// Older memories gradually become less relevant

Memory Statistics

Monitor memory usage:

const stats = await memoryManager.getStats('session-123');

console.log(`Total memories: ${stats.totalMemories}`);
console.log(`By type:`, stats.byType);
console.log(`Average importance: ${stats.averageImportance}`);
console.log(`Oldest: ${stats.oldestMemory}`);
console.log(`Newest: ${stats.newestMemory}`);

Memory Pruning

Clean up old or low-importance memories:

// Prune memories older than 30 days
const pruned = await memoryManager.prune({
  olderThan: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000),
});

// Prune low-importance memories
const pruned = await memoryManager.prune({
  minImportance: 0.5,
});

Configuration

Memory Manager Config

const config = {
  maxConversationLength: 20,      // Max messages in buffer
  summarizeAfter: 50,              // Summarize after N messages
  entityExtraction: true,          // Auto-extract entities
  importanceScoring: true,         // Calculate importance scores
  memoryDecay: false,              // Enable time-based decay
  decayRate: 0.1,                  // Decay rate (if enabled)
  maxWorkingMemorySize: 10,        // Max working memory items
};

const memoryManager = new MemoryManager(memoryStore, config);

Buffer Memory Config

const bufferConfig = {
  maxSize: 100,                    // Max memories in buffer
  ttl: 3600000,                    // Time to live (ms)
};

const buffer = new BufferMemory(bufferConfig);

Hybrid Memory Config

const hybridConfig = {
  bufferSize: 20,                  // Buffer size
  archiveThreshold: 15,            // Archive after N messages
  ttl: 3600000,                    // Buffer TTL
};

const hybrid = new HybridMemory(buffer, vectorMemory, hybridConfig);

Use Cases

Customer Support Bot

// Remember customer information
await memoryManager.trackEntity({
  name: 'Jane Smith',
  type: 'customer',
  attributes: { tier: 'premium', accountId: 'ACC-123' },
  // ...
});

// Store support history
await memoryManager.storeFact(
  'Customer reported login issues on 2024-01-15',
  { customerId: 'ACC-123', category: 'support' }
);

// Context-aware responses
const response = await rag.queryWithMemory(
  'What was my previous issue?',
  'session-123',
  'ACC-123'
);

Personal AI Assistant

// Remember preferences
await memoryManager.storeFact('User prefers concise responses');
await memoryManager.storeFact('User timezone is PST');

// Track tasks
await memoryManager.setContext('active_tasks', ['email', 'meeting'], 'session-123');

// Personalized responses
const response = await rag.queryWithMemory(
  'What should I focus on today?',
  'session-123'
);

Educational Tutor

// Track learning progress
await memoryManager.trackEntity({
  name: 'Student-123',
  type: 'student',
  attributes: {
    level: 'intermediate',
    completedLessons: ['intro', 'basics'],
  },
  // ...
});

// Remember misconceptions
await memoryManager.storeFact(
  'Student confused about async/await',
  { studentId: 'Student-123', topic: 'javascript' }
);

Best Practices

Choose the Right Store

Development: Use BufferMemory for fast iteration
Production: Use HybridMemory for best performance
Semantic Search: Use VectorMemory when search is critical

Set Appropriate Limits

Configure maxConversationLength based on LLM token limits
Set archiveThreshold to balance performance and memory
Use summarizeAfter to compress long conversations

Enable Features Selectively

entityExtraction: For tracking people and things
importanceScoring: For prioritization
memoryDecay: For time-based relevance

Monitor Memory Usage

// Regular monitoring
const stats = await memoryManager.getStats();
console.log(`Memory usage: ${stats.totalMemories}`);

// Periodic pruning
setInterval(async () => {
  await memoryManager.prune({ olderThan: thirtyDaysAgo });
}, 24 * 60 * 60 * 1000); // Daily

Session Management

// Use consistent session IDs
const sessionId = `user-${userId}-${Date.now()}`;

// Clear sessions when done
await memoryManager.clearConversation(sessionId);
await memoryManager.clearContext(sessionId);

Examples

Check out the memory examples for complete working code:

Basic Memory: Core features and memory types
RAG with Memory: Integration with document retrieval
Chatbot with Memory: Complete context-aware chatbot
Shared Memory (RAG + Agent): One MemoryManager shared by RAG and Agent in-process (npm run memory:shared in the example app)

API Reference

MemoryManager

class MemoryManager {
  // Conversation
  addMessage(message: Message, sessionId: string): Promise<string>
  getConversationHistory(sessionId: string, limit?: number): Promise<Message[]>
  summarizeConversation(sessionId: string): Promise<string>
  clearConversation(sessionId: string): Promise<void>
  
  // Entity
  trackEntity(entity: Entity): Promise<void>
  getEntity(name: string): Promise<Entity | null>
  updateEntity(name: string, updates: Partial<Entity>): Promise<void>
  getAllEntities(sessionId?: string): Promise<Entity[]>
  
  // Facts
  storeFact(fact: string, metadata?: Record<string, any>): Promise<string>
  recallFacts(query: string, options?: MemorySearchOptions): Promise<string[]>
  updateFact(id: string, newContent: string): Promise<void>
  
  // Working Memory
  setContext(key: string, value: any, sessionId: string): Promise<void>
  getContext(key: string, sessionId: string): Promise<any>
  clearContext(sessionId: string): Promise<void>
  
  // Search & Stats
  relevantMemories(query: string, options: MemorySearchOptions): Promise<Memory[]>
  getStats(sessionId?: string): Promise<MemoryStats>
}

RAGPipelineWithMemory

class RAGPipelineWithMemory extends RAGPipeline {
  queryWithMemory(
    query: string,
    sessionId: string,
    userId?: string,
    options?: RAGQueryOptions
  ): Promise<RAGResponseWithMemory>
  
  queryWithLearning(
    query: string,
    sessionId: string,
    userId?: string,
    options?: RAGQueryOptions
  ): Promise<RAGResponseWithMemory>
  
  clearSessionMemory(sessionId: string): Promise<void>
  getConversationSummary(sessionId: string): Promise<string>
  storeFact(fact: string, sessionId?: string, userId?: string): Promise<string>
  recallFacts(query: string, topK?: number): Promise<string[]>
  getMemoryStats(sessionId?: string): Promise<MemoryStats>
}

Next Steps

Memory Package — Multi-store memory with in-memory default, categories, and optional Prisma/Redis/vector
Shared memory — One MemoryManager for RAG and agents; @hazeljs/memory as backend
Explore RAG Patterns for advanced retrieval strategies
Check out Vector Stores for storage options
See the RAG Package for complete API reference