Nirav Madhani
<- Back to Projects
2023 - Present

Production RAG System (ARGO DATA)

ML/DLAI AgentsSystemsRAG

Challenges Solved

  • Real-time Retrieval: Achieved sub-100ms P50 latency for knowledge retrieval using LangChain + Pinecone + FastAPI.
  • Automated Indexing: Built an automated indexing pipeline that triggers on wiki content changes to re-chunk and update embeddings in real-time.

Signal

Production Scale / Latency Engineering / System Design

System Architecture

The system is designed for high reliability and low latency in a production environment:

  • Ingestion Engine: A FastAPI-based service that monitors internal wiki changes and triggers a processing workflow via Celery/RabbitMQ.
  • Vector Core: Uses Pinecone for high-speed similarity search across millions of documents.
  • RAG Orchestrator: Built with LangChain, it manages the retrieval-inference loop and ensures context window optimization.

Technical Depth

  • Latency Engineering: Implemented caching layers and optimized embedding generation to hit the <100ms P50 target.
  • Self-Healing Pipeline: Engineered a robust data synchronization layer that handles failed processing attempts with automatic retries and consistency checks.
  • Deployment: Horizontally scalable architecture deployed on Azure, utilizing Kubernetes for container orchestration.

Links

  • Internal Project (ARGO DATA)