Blog | Nirav Madhani

Real-Time Object Recognition and Task Execution Robot

May 1, 2024 · One min read

This project focuses on enabling a robot to perform complex tasks in real-time by leveraging Large Language Models and advanced computer vision.

Key Achievements

Reduced task failure rate by 20% by offloading compute-intensive tasks to a dedicated multi-node computing setup.
Improved task sequencing efficiency by 15% by integrating a DINO image grounding model with LangChain tools, allowing a ReAct agent to dynamically plan and execute tasks based on visual input.
Integrated LangChain ReAct agents for advanced reasoning and decision-making, enabling the robot to autonomously operate based on real-time data.

Technologies Used: LangChain, ReAct Agents, DINO, ROS, C++, Vision-Language Models, Multi-node Processing.

LLM Embodiment in 3D Agent Interacting Within a Virtual World

April 15, 2024 · One min read

This project explores the embodiment of Large Language Models within a 3D virtual environment built in Unity, allowing for complex task simulation and interaction.

Key Achievements

Developed an LLM agent embodiment framework that resulted in a 40% faster simulation of complex tasks.
Improved data exchange efficiency between Unity and Python by 25% using JSON-RPC.
Constructed a Chain of Thoughts-based ReAct agent, which increased task reasoning accuracy by 35%.

Technologies Used: Unity, Python, LangChain, PyTorch, Nvidia NIMS, JSON-RPC.

Multimodal LLM Powered Medicine Reminder App

March 20, 2024 · One min read

This project is a mobile application that helps users remember to take their medication by processing images of their prescriptions using a multimodal Large Language Model.

Key Achievements

Designed an image-to-JSON processing pipeline that reduced image processing time by 40%.
Leveraged the Gemini API and LangChain to reduce API response times by 25% and improve backend processing speed by 30%.
Deployed on a serverless architecture, decreasing operational costs by 70% and achieving near-instant horizontal scalability.

Technologies Used: LangChain, Serverless Functions, Gemini LLM API, MLOps.

TikTok Tech Jam – NLP Powered Search Function for Store

February 10, 2024 · One min read

This project, developed for a TikTok Tech Jam, is an NLP-powered search engine that improves search relevance and speed for an online store.

Key Achievements

Built a search engine that reduced query processing time by 35%, handling up to 500 queries per second.
Optimized the embedding generation pipeline using OpenAI and LangChain, cutting CPU usage by 20%.
Improved metadata filtering efficiency with a Pinecone schema, reducing storage costs by 25% and increasing query matching accuracy by 10%.

Technologies Used: RAG, LLM Ops, MLOps, LangChain, Pinecone, Python.

Key Achievements​

Key Achievements​

Key Achievements​

Key Achievements​

Key Achievements

Key Achievements

Key Achievements

Key Achievements