<- Back to Projects

Dec 1, 2024

Vision Model + Agent in Simulated World

AI AgentsML/DLEmbodied AISim2Real

Challenges Solved

Autonomous Navigation: Built an autonomous agent navigating 3D Unity/Gym environments with >85% task completion rate.
ReAct Framework: Implemented ReAct framework with LangChain to ground LLM reasoning into spatial navigation actions.
Latency Reduction: Reduced decision latency to <200ms P50 using optimized perception-to-action pipelines.

Signal

Embodied AI / Sim2Real / Agentic Workflow

Project Scope

This project focuses on the intersection of computer vision and autonomous agents. The agent operates in a simulated 3D world (Unity), where it must interpret visual feedback to execute a series of tasks.

Key Components

Perception Module: Uses a vision model to convert raw pixels into semantic representations or state observations.
ReAct Agent: A "Reason and Act" agent that maintains a chain of thought to plan and execute multi-step actions.
Integration Layer: Custom LangChain implementation that bridges the vision model with the agentic reasoning engine.

Technical Depth

Loop Optimization: Heavily optimized the perception-to-action pipeline to ensure real-time responsiveness in the simulation.
Sim-to-Real Considerations: Designed the agent to be robust to sensor noise and environmental variations, paving the way for sim-to-real transfer.
Task Planning: Implemented structured prompting and memory management to keep the agent focused on long-horizon goals.

Media

Links

Video Demo