Ishita Gupta

I'm a second-year Master's student in Robotic Systems Development (MRSD) at Carnegie Mellon University, where I specialize in robotics, deep learning, and the systems engineering required to bring complex robotic projects from concept to reality.

Previously, I was at Addverb for 2.5 years, where I engineered autonomous navigation systems, physics-based simulators, and multi-robot applications. I also interned at Google on the Nest Devices team, automating cloud infrastructure pipelines.

I'm passionate about building systems that perceive, reason, and act in complex environments. My current work explores 3D Vision and the application of Reinforcement Learning in Vision Language Action Models (VLAs). I am especially interested in using these techniques to solve complex, long-horizon planning problems for robotics and broader AI.

Email / LinkedIn / Google Scholar / Github

News

Oct 2025 Secured 3rd place in the CMU VLA Challenge and will be presenting our work at IROS 2025! 🏆

Sep 2025 Conducted an in-person lab at CMU on Recurrent Neural Networks and GRUs. 📚

Aug 2025 Completed my summer internship at Nissan Advanced Technology Center in the Bay Area, focusing on Humanoid Robotics and fine-tuning robotic foundation models. 🌁

Apr 2025 Demonstrated our Autonomous Humanoid Loco-Manipulation for Tote Logistics Capstone Project. 📦 🦾

Mar 2025 Published tutorial videos on How to read Research Papers, Python Basics, and Distributed Training for students at CMU. 📚

Aug 2024 Started my Master's at CMU's Robotics Institute! 🤖

July 2024 Completed 2.5 years at Addverb as a Robotics Software Engineer. 🚀

May 2022 Completed my undergraduate studies at LNMIIT with a B.Tech in Computer Science and Engineering. 🎓

Aug 2021 Completed my internship at Google as a Software Engineering Intern.

Publications

FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation
Y. Zhang, Y. Yuan, P. Gurunath, Ishita Gupta, Tairan He, et al., Guanya Shi
In submission

project page / paper / code

TL;DR: FALCON enables various heavy-duty humanoid loco-manipulation tasks via a new dual-agent force-adaptive RL framework.

Industry Experience

Nissan Advanced Technology Center - Silicon Valley May 2025 - Aug 2025

Robotics Research Intern

Humanoid Robotics Team

Focus: Developing autonomous manipulation and locomotion for Unitree G1 humanoid robot in manufacturing logistics

PyTorch Diffusion Models VLA PPO/RL ROS Computer Vision

View Details

Addverb Technologies Jan 2022 - Jul 2024

Robotics Software Engineer

Advanced Robotics & Industrial Automation

Focus: Visual SLAM, physics simulation, and multi-robot control systems for industrial automation

C++ Python OpenCV SLAM OpenGL PhysX ROS Sensor Fusion

View Details

Google May 2021 - Aug 2021

Software Engineering Intern

Nest Devices · Cloud Infrastructure

Focus: Cloud deployment automation and code generation infrastructure for Nest device services

Python Go Bash Cloud Infrastructure DevOps Automation

View Details

Projects & Research

	Autonomous Humanoid Loco-Manipulation for Tote Logistics CMU MRSD Capstone Project, Fall 2024 - Fall 2025 Advised by Prof. Guanya Shi Sponsored by Nissan & Field AI project page / code Built an autonomy system that fuses 6D pose estimation (NVIDIA FoundationPose) with motion-capture localization for precise perception and navigation. This allowed the Unitree G1 to autonomously manipulate totes and operate effectively in real-world factory workflows.
	Training Language Models to Self-Correct via Reinforcement Learning CMU, Fall 2024 project report Benchmarked self-correction capabilities across Llama 3.2 1B, Llama 3.1 8B, and Mathstral 7B on the MATH dataset, achieving 41.8% accuracy. Implemented the proposed multi-turn reinforcement learning framework (SCoRe) for fine-tuning LLMs to improve self-correction.
	CMU Vision-Language-Action Challenge: Embodied Reasoning System CMU VLA Challenge (3rd Place), Fall 2025 problem / code Built a Vision-Language Navigation (VLN) system that answered natural language queries by combining Gemini 2.5 Pro embodied reasoning with a custom ROS state machine. The system produced numerical answers, object references, or waypoint plans under a strict 10-minute limit. View Details Key Contributions: • Natural Language Understanding: Leveraged Gemini 2.5 Pro to classify and intelligently reason over spatial relations (e.g., "closest to the window"), enabling complex query understanding • State Machine Architecture: Designed a ROS state machine to coordinate exploration, mapping, and answering with dynamic transitions for efficient task execution • Deployment Ready: Deployed the complete system on a real robot through clean Docker containerization, achieving 3rd place in the challenge
	Project Yoriichi: Real-Time Motion Tracking and Imitation Learning for Robotic Fencing CMU Robot Autonomy Course Project, Spring 2025 video demo Developed a full-stack motion tracking and imitation learning system enabling a Franka Emika Panda robot to dynamically track and mimic human sword motion. Integrated YOLOv8 for real-time object segmentation with HSV thresholding for pose detection, transforming sword trajectories into the world frame to continuously update the robot's pose for motion imitation.
	Spatio-Temporal Transformer for Video Anomaly Detection CMU , Spring 2024 project report Re-implemented the LIPINC-V2 Vision Temporal Transformer for deepfake detection, validating it by reproducing the paper's 0.98 AP on LipSyncTIMIT. Engineered a custom 2,300-sample dataset and established a CNN-LSTM baseline achieving 94.9% accuracy, revealing critical insights into dataset bias and model generalization.

Education

	Carnegie Mellon University Master of Science in Robotic Systems Development (MRSD) CGPA: 3.83 \| August 2024 - May 2026 Coursework: Deep Reinforcement Learning (10-703), Generative AI (10-623) Show more
	The LNM Institute of Information Technology (LNMIIT) Bachelor of Technology (B.Tech) in Computer Science and Engineering August 2018 - July 2022 Coursework: Artificial Intelligence, NLP, Advanced Algorithms Show more

Teaching Experience

Carnegie Mellon University
Teaching Assistant - Introduction to Deep Learning (11-785)
Spring 2025, Fall 2025
Course Website / YouTube Channel

Responsibilities:
• Lead recitation sections and office hours for 300+ students
• Mentor teams on Deep Learning projects
• Assist in course development and curriculum refinement
• Create educational content and recorded lectures for online learning