RAG Chat integrates document understanding with conversational AI through optimized file processing, embeddings, and vector database storage for high-relevance responses.
Processing Pipeline
1. Optimized File Upload and Processing
- User uploads file through frontend interface
- Backend receives file via event streaming
- Parallel Processing (2 chunks simultaneously):
- Chunk 1: Direct upload to S3 storage
- Chunk 2: Text extraction and embedding generation
- Vector embeddings stored in Pinecone
- Event streaming for non-blocking uploads
- Concurrent S3 upload and embedding generation
- Significant performance improvement over sequential processing
- Progress tracking for user feedback
2. Vector Storage and Retrieval
- Storage: Embedded chunks stored in Pinecone vector database
- Retrieval: Semantic similarity search returns relevant chunks
- Optimization: Top-k similarity search with metadata filtering
- Embeddings indexed by collection name
- Query embeddings generated using same model as indexing
- Metadata filters for user/session isolation
- Configurable similarity thresholds
3. Unified Request Processing
- Frontend emits single socket event with query
- Node.js backend receives request
- LangGraph analyzes query and context requirements
- Backend determines document-based processing needed
- Backend automatically detects document context requirement
- LangGraph routes to appropriate handler
- Single entry point simplifies frontend code
4. Context Assembly
Components Retrieved:- Relevant Chunks: Pinecone similarity search results
- Chat History: Previous messages from MongoDB
- User Query: Current question or request
5. Response Generation
- LangGraph receives assembled context
- Single LLM call processes all information
- Response generated incorporating document knowledge
- Streamed back to frontend via Socket.IO
- Total LLM Calls: 1 call with document context
- MongoDB Handler tracks response
- Token-based cost calculated via Cost Callback
- Usage metrics stored for reporting
Architecture

RAG Chat Processing Flow
Key Components
Component | Purpose |
---|---|
Event Streaming | Non-blocking file uploads |
S3 Storage | Persistent file storage |
Embedding Generator | Convert text to vector embeddings |
Pinecone | Vector similarity search |
LangGraph | Intelligent routing and context assembly |
Chat Repository | Conversation history management |
Cost Tracker | Token usage and pricing calculation |
File Processing Optimization
Parallel Processing Architecture
Benefits
- Speed: 2x faster than sequential processing
- User Experience: Faster file availability
- Resource Efficiency: Better CPU utilization
- Scalability: Handles multiple uploads concurrently
Query Processing Flow
1. Query Reception
2. Vector Retrieval
3. Context Assembly
4. Response Generation
Troubleshooting
Chunks Not Retrieved from Pinecone
Potential Issues:- Embeddings not properly generated
- Embedding model mismatch
- Incorrect Pinecone index name
- Metadata filters too restrictive
- Verify embeddings stored successfully
- Check embedding model consistency
- Validate Pinecone connection
- Review similarity threshold settings
- Test without metadata filters