Skip to main content

One post tagged with "Vision-Language Models"

View All Tags

Real-Time Object Recognition and Task Execution Robot

· One min read

This project focuses on enabling a robot to perform complex tasks in real-time by leveraging Large Language Models and advanced computer vision.

Key Achievements​

  • Reduced task failure rate by 20% by offloading compute-intensive tasks to a dedicated multi-node computing setup.
  • Improved task sequencing efficiency by 15% by integrating a DINO image grounding model with LangChain tools, allowing a ReAct agent to dynamically plan and execute tasks based on visual input.
  • Integrated LangChain ReAct agents for advanced reasoning and decision-making, enabling the robot to autonomously operate based on real-time data.

Technologies Used: LangChain, ReAct Agents, DINO, ROS, C++, Vision-Language Models, Multi-node Processing.