Skip to main content

Real-Time Object Recognition and Task Execution Robot

· One min read

This project focuses on enabling a robot to perform complex tasks in real-time by leveraging Large Language Models and advanced computer vision.

Key Achievements

  • Reduced task failure rate by 20% by offloading compute-intensive tasks to a dedicated multi-node computing setup.
  • Improved task sequencing efficiency by 15% by integrating a DINO image grounding model with LangChain tools, allowing a ReAct agent to dynamically plan and execute tasks based on visual input.
  • Integrated LangChain ReAct agents for advanced reasoning and decision-making, enabling the robot to autonomously operate based on real-time data.

Technologies Used: LangChain, ReAct Agents, DINO, ROS, C++, Vision-Language Models, Multi-node Processing.

LLM Embodiment in 3D Agent Interacting Within a Virtual World

· One min read

This project explores the embodiment of Large Language Models within a 3D virtual environment built in Unity, allowing for complex task simulation and interaction.

Key Achievements

  • Developed an LLM agent embodiment framework that resulted in a 40% faster simulation of complex tasks.
  • Improved data exchange efficiency between Unity and Python by 25% using JSON-RPC.
  • Constructed a Chain of Thoughts-based ReAct agent, which increased task reasoning accuracy by 35%.

Technologies Used: Unity, Python, LangChain, PyTorch, Nvidia NIMS, JSON-RPC.

Multimodal LLM Powered Medicine Reminder App

· One min read

This project is a mobile application that helps users remember to take their medication by processing images of their prescriptions using a multimodal Large Language Model.

Key Achievements

  • Designed an image-to-JSON processing pipeline that reduced image processing time by 40%.
  • Leveraged the Gemini API and LangChain to reduce API response times by 25% and improve backend processing speed by 30%.
  • Deployed on a serverless architecture, decreasing operational costs by 70% and achieving near-instant horizontal scalability.

Technologies Used: LangChain, Serverless Functions, Gemini LLM API, MLOps.

TikTok Tech Jam – NLP Powered Search Function for Store

· One min read

This project, developed for a TikTok Tech Jam, is an NLP-powered search engine that improves search relevance and speed for an online store.

Key Achievements

  • Built a search engine that reduced query processing time by 35%, handling up to 500 queries per second.
  • Optimized the embedding generation pipeline using OpenAI and LangChain, cutting CPU usage by 20%.
  • Improved metadata filtering efficiency with a Pinecone schema, reducing storage costs by 25% and increasing query matching accuracy by 10%.

Technologies Used: RAG, LLM Ops, MLOps, LangChain, Pinecone, Python.