Nirav Madhani
<- Back to Projects
Dec 1, 2024

Vision Model + Agent in Simulated World

AI AgentsML/DLEmbodied AISim2Real

Challenges Solved

  • Autonomous Navigation: Built an autonomous agent navigating 3D Unity/Gym environments with >85% task completion rate.
  • ReAct Framework: Implemented ReAct framework with LangChain to ground LLM reasoning into spatial navigation actions.
  • Latency Reduction: Reduced decision latency to <200ms P50 using optimized perception-to-action pipelines.

Signal

Embodied AI / Sim2Real / Agentic Workflow

Project Scope

This project focuses on the intersection of computer vision and autonomous agents. The agent operates in a simulated 3D world (Unity), where it must interpret visual feedback to execute a series of tasks.

Key Components

  • Perception Module: Uses a vision model to convert raw pixels into semantic representations or state observations.
  • ReAct Agent: A "Reason and Act" agent that maintains a chain of thought to plan and execute multi-step actions.
  • Integration Layer: Custom LangChain implementation that bridges the vision model with the agentic reasoning engine.

Technical Depth

  • Loop Optimization: Heavily optimized the perception-to-action pipeline to ensure real-time responsiveness in the simulation.
  • Sim-to-Real Considerations: Designed the agent to be robust to sensor noise and environmental variations, paving the way for sim-to-real transfer.
  • Task Planning: Implemented structured prompting and memory management to keep the agent focused on long-horizon goals.

Media

Links