LLM Query Flow

Streamlined processing pipeline handling user queries through LangGraph-powered intelligent routing in Node.js.

The architecture uses a single socket event for all AI operations, with backend intelligence determining the appropriate processing flow.

LLM Query Flow Architecture

Unified Processing Pipeline

Single Entry Point

Socket Event Emission - Frontend emits single event to Node.js server
LangGraph Router - Backend analyzes request and determines operation type
Operation Execution - Appropriate handler processes request
Response Streaming - Real-time results streamed back via Socket.IO
Frontend Display - UI updates with streamed response

Operation Types

1. Normal Chat Flow

Process:

Frontend emits socket event with user query
LangGraph analyzes query and calls appropriate LLM
Response streams back to frontend
LLM Calls: 1 call (no tools needed)

Tool Integration:

Web Search (SearxNG): Automatically triggered for search queries
Image Generation (DALL·E): Activated for image requests
Vision Processing: Handles uploaded images with vision-enabled models

Model Support:

Web search supported for all models except: GPT-4o latest, DeepSeek, Qwen
Image generation available for OpenAI models
Vision processing for compatible models

2. Document Chat Flow

File Upload Process:

Event streaming for optimized uploads to S3
Parallel processing (2 chunks at a time):
- Chunk 1: S3 upload
- Chunk 2: Vector embedding (Pinecone)
Significantly faster than sequential processing

Query Process:

User asks document-related question
Relevant vectors fetched from Pinecone
Vectors sent to LLM as context
Response generated and streamed
LLM Calls: 1 call with document context

3. Agent Chat Flow

Process:

User selects agent from interface
Agent data fetched from database
Agent’s system prompt included with user query
LLM generates response using agent context
LLM Calls: 1 call with agent prompt

4. Agent + Document Flow

Combined Context:

Agent data retrieved from database
Relevant document vectors fetched from Pinecone
Both contexts merged for LLM
Response incorporates both agent expertise and document knowledge
LLM Calls: 1 call with combined context

Intelligent Backend Routing

LangGraph Decision Making

The backend automatically determines:

Operation Type: Chat, search, document, agent, or combination
Tool Requirements: Web search, image generation, vision processing
Model Capabilities: Ensures model supports requested features
Context Assembly: Combines appropriate data sources

Tool Activation Logic

Query Analysis
    ↓
Contains search intent? → Activate SearxNG
Contains image request? → Activate DALL·E
Has uploaded image? → Use vision model
Has document context? → Fetch Pinecone vectors
Has agent selected? → Load agent prompt
    ↓
Assemble final context → LLM call → Stream response

Web Search Independence

SearxNG Integration

Self-hosted SearxNG metasearch engine
Works with all supported models (except GPT-4o latest, DeepSeek, Qwen)
Complete control over search configuration
Privacy-focused with no external dependencies

Frontend

Single socket event for all operations
Backend handles all decision-making
Simplified frontend code
Unified error handling

Response Streaming

Real-time Delivery

LLM Generation
    ↓
Token-by-token streaming
    ↓
Socket.IO transmission
    ↓
Frontend progressive rendering
    ↓
Immediate user feedback

Error Handling

Unified Error Management

Error Type	Handling
Model Unavailable	Automatic fallback to alternative model
Token Limit Exceeded	Context truncation with user notification
Tool Failure	Graceful degradation, continue without tool
Network Issues	Retry logic with exponential backoff
Invalid Input	Validation at entry point with clear messaging

Monitoring & Logging

Operation Tracking

Socket event logging
LangGraph decision logging
LLM call metrics
Response time tracking
Error rate monitoring
Token usage analytics

Performance Metrics

Average response time per operation type
Tool activation frequency
Model selection distribution
Error rates by category
User satisfaction indicators

Common Functions

App Development

Workflows

Unified Processing Pipeline

Single Entry Point

Operation Types

1. Normal Chat Flow

2. Document Chat Flow

3. Agent Chat Flow

4. Agent + Document Flow

Intelligent Backend Routing

LangGraph Decision Making

Tool Activation Logic

Web Search Independence

SearxNG Integration

Frontend

Response Streaming

Real-time Delivery

Error Handling

Unified Error Management

Monitoring & Logging

Operation Tracking

Performance Metrics

Common Functions

App Development

Workflows

​Unified Processing Pipeline

​Single Entry Point

​Operation Types

​1. Normal Chat Flow

​2. Document Chat Flow

​3. Agent Chat Flow

​4. Agent + Document Flow

​Intelligent Backend Routing

​LangGraph Decision Making

​Tool Activation Logic

​Web Search Independence

​SearxNG Integration

​Frontend

​Response Streaming

​Real-time Delivery

​Error Handling

​Unified Error Management

​Monitoring & Logging

​Operation Tracking

​Performance Metrics

Unified Processing Pipeline

Single Entry Point

Operation Types

1. Normal Chat Flow

2. Document Chat Flow

3. Agent Chat Flow

4. Agent + Document Flow

Intelligent Backend Routing

LangGraph Decision Making

Tool Activation Logic

Web Search Independence

SearxNG Integration

Frontend

Response Streaming

Real-time Delivery

Error Handling

Unified Error Management

Monitoring & Logging

Operation Tracking

Performance Metrics