I'm a second-year Master's student in Robotic Systems Development (MRSD) at Carnegie Mellon University, where I specialize in robotics, deep learning, and the systems engineering required to bring complex robotic projects from concept to reality.
Previously, I was at Addverb for 2.5 years, where I engineered autonomous navigation systems, physics-based simulators, and multi-robot applications. I also interned at Google on the Nest Devices team, automating cloud infrastructure pipelines.
I'm passionate about building systems that perceive, reason, and act in complex environments. My current work explores 3D Vision and the application of Reinforcement Learning in Vision Language Action Models (VLAs). I am currently a Graduate Research Assistant at CMU's Robotics Institute, working with Prof. Katerina Fragkiadaki on online RL methods for 3D VLAs. I am especially interested in using these techniques to solve complex, long-horizon planning problems for robotics and broader AI.
Aug 2025Completed my summer internship at Nissan Advanced Technology Center in the Bay Area, focusing on Humanoid Robotics and fine-tuning robotic foundation models. ๐
Fine-tuned and improved NVIDIA GR00T N1.5 VLA model with asynchronous inference architecture and real-time teleoperation/data infrastructure for reliable deployment and scalable training.
Built an autonomy system that fuses 6D pose estimation (NVIDIA FoundationPose) with motion-capture localization for precise perception and navigation. Enabled the Unitree G1 to autonomously manipulate totes and operate effectively in real-world factory workflows.
Online Reinforcement Learning for Robotic Foundation Models CMU, Fall 2025
Fine-tuned OpenVLA-OFT with GRPO & LoRA, enabling task adaptation beyond SFT on the sparse-reward LIBERO benchmark. Boosted task success from 80% to 98%, preserving a 100Hz control frequency by training a decoupled stochastic policy head.
Semantically Embedded 3D Gaussian Splatting VLAs CMU, Fall 2025 Advised by Prof. Shubham Tulsiani
Developed a 3D Gaussian Splatting perception module fused with NVIDIA GR00T to improve grasp reliability and pick-and-place on a Kinova Gen3 arm showing 44% improvement over vanilla GR00T.
Trained LLaMA for autonomous self-correction via a two-stage policy gradient framework with KL-constrained initialization and shaped rewards, achieving a 57% reduction in answer instability on MATH500 by mitigating behavior collapse and distribution shift in multi-turn RL.
Built an indoor VLN system that answers natural language queries by combining Gemini 2.5 Pro embodied reasoning with a custom ROS 2 state machine. The system produced numerical answers, object references, or waypoint plans under a strict 10-minute limit.
Developed a full-stack motion tracking and imitation learning system enabling a Franka Emika Panda robot to dynamically track and mimic human sword motion. Integrated YOLOv8 for real-time object segmentation with HSV thresholding for pose detection, transforming sword trajectories into the world frame to continuously update the robot's pose for motion imitation.
Re-implemented the LIPINC-V2 Vision Temporal Transformer for deepfake detection, reproducing the published 0.98 AP on the LipSyncTIMIT benchmark. Engineered a custom 2,300-sample video dataset and established a CNN-LSTM baseline (94.9% accuracy), analyzing dataset bias and temporal model generalization.
Built Apple Vision Pro teleoperation system and curated 800+ bimanual demonstrations for the Unitree humanoid (28 DoF). Implemented 2D RGB and 3D point-cloud diffusion policies, and fine-tuned NVIDIA GR00T N1.5 with LoRA, improving robustness by 3.3x. Developed RL-based whole-body control for humanoid autonomy (L4DC '26 Oral).
Implemented the backend of an ORB-SLAM system for a quadruped robot, focusing on pose-graph optimization, local bundle adjustment, and keyframe management in GPS-denied environments. Engineered a real-time, thread-safe physics simulator in modern C++ using OpenGL and NVIDIA PhysX, supporting deterministic 100Hz control loops and haptic hardware integration.
Automated the backend cloud pipeline for camera onboarding, reducing a 4-month workflow to a single execution. Built a tool that generated 1,000+ LOC across multiple languages and automated change-list publishing.
Education
Carnegie Mellon University Master of Science in Robotic Systems Development (MRSD) CGPA: 3.83 | August 2024 - May 2026 Coursework:Diffusion & Flow Matching, Deep Reinforcement Learning (10-703), Generative AI (10-623)Diffusion & Flow Matching, Deep Reinforcement Learning (10-703), Generative AI (10-623), Multimodal Machine Learning (11-777), Introduction to Deep Learning (11-785), Learning for 3D Vision (16-825), Robot Mobility, Manipulation, Estimation & Controls, Systems Engineering, Robot AutonomyShow more
The LNM Institute of Information Technology (LNMIIT) Bachelor of Technology (B.Tech) in Computer Science and Engineering
August 2018 - July 2022 Coursework:Artificial Intelligence, NLP, Advanced AlgorithmsProbability & Statistics, Artificial Intelligence, NLP, GANs, Data Structures and Algorithms, Operating Systems, Computer Networks, Computer Architecture, Database Management SystemsShow more
Teaching Experience
Carnegie Mellon University Teaching Assistant - Introduction to Deep Learning (11-785)
Spring 2025, Fall 2025 Course Website / YouTube Channel
Responsibilities:
โข Lead recitation sections and office hours for 400+ students
โข Mentor teams on Deep Learning projects
โข Assist in course development and curriculum refinement
โข Create educational content and recorded lectures for online learning