Agents: Working and Implementation

Node.js backend implementation for agents supporting both Simple Chat and RAG-style interactions with custom prompts and contextual documents.

Agents provide enhanced chat experiences by automatically selecting between Simple Chat and RAG Chat based on document availability, with all routing handled intelligently by LangGraph.

Processing Flow

1. Agent Selection and Routing

User selects agent from interface
Frontend emits single socket event with query and agent ID
Node.js backend receives request
LangGraph analyzes agent configuration and requirements
Backend retrieves agent data from MongoDB

Requirements:

Agent ID and configuration fully loaded
Agent metadata (tools, prompt, document presence) verified
Document availability automatically detected

2. Document-Based Path Detection

With Documents (RAG Chat):

Agent’s custom system prompt
Document retrieval from Pinecone
Web Analysis Tool (if supported)
Image Generation Tool (OpenAI models only)
Web Search Tool (SearxNG - not supported for GPT-4o latest, DeepSeek, Qwen)

Without Documents (Simple Chat):

Agent’s system prompt
Standard chat processing
Image Generation Tool (OpenAI models only)
Web Analysis Tool
Web Search Tool (SearxNG)

Implementation:

LangGraph automatically detects document presence
Backend determines appropriate processing path
No frontend decision logic required

3. Unified Context Assembly

Combined Context Elements:

Agent Prompt: Custom agent instructions
Chat History: Previous conversation messages
User Query: Current user input
Documents (if applicable): Retrieved vector chunks

Context Structure:

System: {agent_prompt}
History: {chat_history}
Documents: {retrieved_chunks} (if RAG mode)
User: {query}

Token Management:

Overflow handled using rolling window strategy
Context trimmed automatically when limits approached
Priority given to recent messages and relevant documents

4. Single LLM Call Processing

Execution:

LangGraph assembles complete context
Single LLM call with all required information
Response generated and streamed to frontend
MongoDB storage via Cost Callback tracker

Data Stored:

LLM response content
Agent ID and configuration
Token cost and usage metrics
Processing time and metadata
Tool activations (if any)

5. RAG Implementation (Document-Based Agents)

Document Processing:

Text extraction from uploaded files
Content split into optimized chunks
Embedding generation using configured model
Parallel storage in Pinecone and S3

Query Inference:

Query embedding generated
Pinecone retrieves similar chunks
Chunks combined with agent prompt
Enhanced context sent to LLM
Total LLM Calls: 1 call with all context

Technical Requirements:

Consistent embedding model for upload and retrieval
Top-k vector search with agent-level metadata filtering
Automatic context size management

Architecture

Agent Processing Architecture

Tool Activation Matrix

Tool	Simple Chat	RAG Chat	Model Requirement
Document Retrieval	❌	✅	Any
Web Search (SearxNG)	✅	✅	All except GPT-4o latest, DeepSeek, Qwen
Web Analysis Tool	✅	✅	Any
Image Generation	✅	✅	OpenAI models only

Key Components

Component	Purpose
LangGraph Router	Automatic path selection and routing
Agent Repository	Agent configuration and prompt retrieval
Document Detector	Identifies RAG vs Simple Chat requirements
Pinecone Client	Vector retrieval for document context
Cost Tracker	Token usage and pricing tracking
MongoDB Handler	Response and metrics persistence

Backend Intelligence

Automatic Detection

LangGraph backend identifies:

Agent Configuration: Custom prompts and settings
Document Availability: RAG vs Simple Chat routing
Tool Requirements: Web search, image generation needs
Model Capabilities: Validates feature support

Decision Flow

Agent Query Received
    ↓
Load Agent Configuration
    ↓
Check Document Availability
    ↓
Documents Present?
    ├─ Yes → RAG Mode
    │   ├─ Retrieve Vectors (Pinecone)
    │   ├─ Assemble Context (Agent + Docs + History)
    │   └─ LLM Call (1 call)
    │
    └─ No → Simple Chat Mode
        ├─ Assemble Context (Agent + History)
        └─ LLM Call (1 call)
    ↓
Stream Response to Frontend

Cost Optimization

Single Call Efficiency

Token Reduction:

Previous: Multiple calls with repeated context
Current: Single call with optimized context
Savings: 40-60% reduction in token usage

Cost Tracking:

Input tokens measured
Output tokens tracked
Cost calculated per interaction
Stored in MongoDB for reporting

Web Search Independence

SearxNG Integration

Benefits:

No dependency on OpenAI search features
Works across multiple model providers
Self-hosted for privacy and control
Consistent search experience

Model Support:

Supported: Most models including GPT-4, Claude, Gemini
Not Supported: GPT-4o latest, DeepSeek, Qwen

Troubleshooting

Agent RAG Not Triggering

Potential Issues:

No documents linked to agent
Embeddings not properly generated
Pinecone index misconfigured
Document metadata filters incorrect

Debug Steps:

Verify agent has linked documents in database
Check embedding generation logs
Validate Pinecone connection and index
Review metadata filtering logic
Test document retrieval independently

Common Functions

App Development

Workflows

Agents: Working and Implementation

Processing Flow

1. Agent Selection and Routing

2. Document-Based Path Detection

3. Unified Context Assembly

4. Single LLM Call Processing

5. RAG Implementation (Document-Based Agents)

Architecture

Tool Activation Matrix

Key Components

Backend Intelligence

Automatic Detection

Decision Flow

Cost Optimization

Single Call Efficiency

Web Search Independence

SearxNG Integration

Troubleshooting

Agent RAG Not Triggering

Common Functions

App Development

Workflows

​Processing Flow

​1. Agent Selection and Routing

​2. Document-Based Path Detection

​3. Unified Context Assembly

​4. Single LLM Call Processing

​5. RAG Implementation (Document-Based Agents)

​Architecture

​Tool Activation Matrix

​Key Components

​Backend Intelligence

​Automatic Detection

​Decision Flow

​Cost Optimization

​Single Call Efficiency

​Web Search Independence

​SearxNG Integration

​Troubleshooting

​Agent RAG Not Triggering

Processing Flow

1. Agent Selection and Routing

2. Document-Based Path Detection

3. Unified Context Assembly

4. Single LLM Call Processing

5. RAG Implementation (Document-Based Agents)

Architecture

Tool Activation Matrix

Key Components

Backend Intelligence

Automatic Detection

Decision Flow

Cost Optimization

Single Call Efficiency

Web Search Independence

SearxNG Integration

Troubleshooting

Agent RAG Not Triggering