One post tagged with "Vision-Language Models"

Real-Time Object Recognition and Task Execution Robot

May 1, 2024 · One min read

This project focuses on enabling a robot to perform complex tasks in real-time by leveraging Large Language Models and advanced computer vision.

Reduced task failure rate by 20% by offloading compute-intensive tasks to a dedicated multi-node computing setup.
Improved task sequencing efficiency by 15% by integrating a DINO image grounding model with LangChain tools, allowing a ReAct agent to dynamically plan and execute tasks based on visual input.
Integrated LangChain ReAct agents for advanced reasoning and decision-making, enabling the robot to autonomously operate based on real-time data.

Technologies Used: LangChain, ReAct Agents, DINO, ROS, C++, Vision-Language Models, Multi-node Processing.