We introduce a zero-shot critical-phase detector (CPD) for frozen vision-language-action policies. CPD couples a temporal-distance latent representation (TLDR) encoder with a self-supervised goal-reachability (G2) labeler, then scores each rollout step by a kernel-density log-ratio between the success and failure latent buffers. On LIBERO-Long with frozen π₀ and π₀.₅ backbones, CPD attains a leave-one-out F1 of 0.86–0.89 for trajectory-level failure detection without any task-specific reward or success label, and we prove a consistency bound between the G2 self-label and the ground-truth success predicate under mild assumptions on the latent geometry.
@inproceedings{choi2026cpd,title={Critical-Phase Detection for Vision-Language-Action Policies},author={Choi, Chanyeok},booktitle={Conference on Robot Learning (CoRL)},year={2026},note={Under review},keywords={vision-language-action, manipulation, failure detection, self-supervised learning, LIBERO}}