Nirav Madhani
<- Back to Projects
Oct 1, 2025

Octo Inference API Deployment

RoboticsAI AgentsMLOpsModel Serving

Challenges Solved

  • Model Deployment: Deployed Octo-1.5 robotics foundation model on Hugging Face Spaces.
  • Inference Optimization: Optimized inference latency for real-time control capability via API endpoints.

Signal

MLOps / Model Serving

Project Overview

Octo-1.5 is a state-of-the-art vision-language-action (VLA) model for robotics. This project focuses on the MLOps challenge of serving such a large-scale model with enough performance for interactive use.

Features

  • Standardized API: Created a REST API using FastAPI to handle multimodal inputs (images + text commands).
  • Hugging Face Deployment: Configured and deployed the stack on Hugging Face Spaces using custom Docker containers.
  • Inference Pipeline: Built an optimized pipeline using Hugging Face's transformers and accelerate libraries.

Technical Depth

  • Mixed-Precision Inference: Utilized BF16/FP16 mixed precision to balance memory usage and inference speed.
  • Dynamic Batching: Implemented request queueing and dynamic batching to handle multiple concurrent users efficiently.
  • Resource Management: Fine-tuned GPU resource allocation within the Space to prevent OOM errors while maintaining throughput.

Links