<- Back to Projects

2023 - Present

Production RAG System (ARGO DATA)

ML/DLAI AgentsSystemsRAG

Challenges Solved

Real-time Retrieval: Achieved sub-100ms P50 latency for knowledge retrieval using LangChain + Pinecone + FastAPI.
Automated Indexing: Built an automated indexing pipeline that triggers on wiki content changes to re-chunk and update embeddings in real-time.

Signal

Production Scale / Latency Engineering / System Design

System Architecture

The system is designed for high reliability and low latency in a production environment:

Ingestion Engine: A FastAPI-based service that monitors internal wiki changes and triggers a processing workflow via Celery/RabbitMQ.
Vector Core: Uses Pinecone for high-speed similarity search across millions of documents.
RAG Orchestrator: Built with LangChain, it manages the retrieval-inference loop and ensures context window optimization.

Technical Depth

Latency Engineering: Implemented caching layers and optimized embedding generation to hit the <100ms P50 target.
Self-Healing Pipeline: Engineered a robust data synchronization layer that handles failed processing attempts with automatic retries and consistency checks.
Deployment: Horizontally scalable architecture deployed on Azure, utilizing Kubernetes for container orchestration.

Links

Internal Project (ARGO DATA)