The architecture uses a single socket event for all AI operations, with backend intelligence determining the appropriate processing flow.

LLM Query Flow Architecture
Unified Processing Pipeline
Single Entry Point
- Socket Event Emission - Frontend emits single event to Node.js server
- LangGraph Router - Backend analyzes request and determines operation type
- Operation Execution - Appropriate handler processes request
- Response Streaming - Real-time results streamed back via Socket.IO
- Frontend Display - UI updates with streamed response
Operation Types
1. Normal Chat Flow
Process:- Frontend emits socket event with user query
- LangGraph analyzes query and calls appropriate LLM
- Response streams back to frontend
- LLM Calls: 1 call (no tools needed)
- Web Search (SearxNG): Automatically triggered for search queries
- Image Generation (DALL·E): Activated for image requests
- Vision Processing: Handles uploaded images with vision-enabled models
- Web search supported for all models except: GPT-4o latest, DeepSeek, Qwen
- Image generation available for OpenAI models
- Vision processing for compatible models
2. Document Chat Flow
File Upload Process:- Event streaming for optimized uploads to S3
- Parallel processing (2 chunks at a time):
- Chunk 1: S3 upload
- Chunk 2: Vector embedding (Pinecone)
- Significantly faster than sequential processing
- User asks document-related question
- Relevant vectors fetched from Pinecone
- Vectors sent to LLM as context
- Response generated and streamed
- LLM Calls: 1 call with document context
3. Agent Chat Flow
Process:- User selects agent from interface
- Agent data fetched from database
- Agent’s system prompt included with user query
- LLM generates response using agent context
- LLM Calls: 1 call with agent prompt
4. Agent + Document Flow
Combined Context:- Agent data retrieved from database
- Relevant document vectors fetched from Pinecone
- Both contexts merged for LLM
- Response incorporates both agent expertise and document knowledge
- LLM Calls: 1 call with combined context
Intelligent Backend Routing
LangGraph Decision Making
The backend automatically determines:- Operation Type: Chat, search, document, agent, or combination
- Tool Requirements: Web search, image generation, vision processing
- Model Capabilities: Ensures model supports requested features
- Context Assembly: Combines appropriate data sources
Tool Activation Logic
Web Search Independence
SearxNG Integration
- Self-hosted SearxNG metasearch engine
- Works with all supported models (except GPT-4o latest, DeepSeek, Qwen)
- Complete control over search configuration
- Privacy-focused with no external dependencies
Frontend
- Single socket event for all operations
- Backend handles all decision-making
- Simplified frontend code
- Unified error handling
Response Streaming
Real-time Delivery
Error Handling
Unified Error Management
Error Type | Handling |
---|---|
Model Unavailable | Automatic fallback to alternative model |
Token Limit Exceeded | Context truncation with user notification |
Tool Failure | Graceful degradation, continue without tool |
Network Issues | Retry logic with exponential backoff |
Invalid Input | Validation at entry point with clear messaging |
Monitoring & Logging
Operation Tracking
- Socket event logging
- LangGraph decision logging
- LLM call metrics
- Response time tracking
- Error rate monitoring
- Token usage analytics
Performance Metrics
- Average response time per operation type
- Tool activation frequency
- Model selection distribution
- Error rates by category
- User satisfaction indicators