Session
Session B: 12:00-2:00PM
Poster Assignment
112
Department
Computer Science - Engineering
Presenter(s)
Jeremi Nuer
Mentor(s)
James Preiss
Title
Hierarchical Vision Language Action Models for Robotic Intelligence
Abstract
Vision Language Action (VLA) Models utilize large pre-trained Vision Language backbones for robotic control. Due to inference-speed costs, hierarchical VLAs--a slow Vision Language Model conditioning a fast action expert--have emerged as a superior model class for high-frequency control. While performance benefits appear to stem from faster inference speed, we find that training dynamics are fundamentally altered and demonstrate how varying latency and speed at train time affects downstream performance. We perform interpretability-based probes to determine which components are most responsible for different model functions.