Dynamic Execution Commitment of Vision-Language-Action Models

A3: Adaptive Action Acceptance for VLA Execution

Feng Chen1*, Xianghui Wang2*, Yuxuan Chen3, Boying Li4, Yefei He5, Zeyu Zhang5, Yicheng Wu6
1University of Adelaide 2Sichuan University 3Shanghai Jiao Tong University 4Monash University 5Zhejiang University 6Imperial College London
* Equal contribution
Execution horizon trade-off analysis

Fixed-horizon trade-off. Longer execution horizons reduce forward calls but can sharply degrade task success, motivating adaptive execution commitment.

54%
Longer LIBERO Commitment
+10.2
Robustness Gain
84.6%
Real-world Avg. Success

Abstract

Vision-Language-Action models often predict a short chunk of future actions in one forward pass, but deciding how many actions to execute before replanning remains a brittle fixed-horizon choice. A3 introduces Adaptive Action Acceptance, a self-speculative prefix verification mechanism for dynamic execution commitment. A3 samples candidate action chunks, estimates trajectory-wise consensus, and verifies the selected draft with two constraints: consensus-ordered conditional invariance and prefix-closed sequential consistency. The final execution horizon emerges as the longest verified prefix, eliminating manual horizon tuning while preserving the trade-off between execution robustness and inference throughput.

Method

Overview of Adaptive Action Acceptance
Overview of A3. Candidate action chunks are sampled, mapped into induced trajectory space, scored for consensus, and verified through dual hierarchical constraints to determine the executable prefix.

A3 is organized around three core pieces:

  • Mode-aware trajectory consensus. Candidate chunks are evaluated in induced trajectory space so consensus reflects rollout-level stability rather than raw per-step action variance.
  • Dual hierarchical verification. A3 combines consensus-ordered conditional invariance with prefix-closed sequential consistency.
  • Emergent commitment length. The committed horizon is the longest verified action prefix, produced without task-specific fixed-horizon tuning.

Results

Dynamic Horizon Comparison

Backbone Method LIBERO MetaWorld ManiSkill
Avg. (%) Len. Avg. (%) Len. Avg. (%) Len.
pi-0 Original 95.1 5.0 78.0 3.0 77.6 5.0
MoH 95.1 5.0 79.4 3.0 78.0 5.0
A3 (Ours) 95.3 9.4 79.2 3.2 78.6 6.2
pi-0.5 Original 97.9 6.3 77.8 3.0 88.1 5.0
MoH 97.7 5.0 78.4 3.3 88.4 5.0
EverydayVLA 97.6 6.8 - - - -
AutoHorizon 96.9 - - - - -
A3 (Ours) 98.1 9.7 79.4 4.5 89.1 5.2
GR00T Original 90.1 4.7 - - - -
A3 (Ours) 92.9 4.5 - - - -

Real-world Manipulation

Setting Exec Horizon FlipMug TapeBox HangMug StackCube Avg. Success Inference Calls
Fixed550.035.025.00.027.591.5
Fixed1070.0100.035.066.767.932.7
Fixed1595.0100.035.086.779.217.7
Fixed2090.095.035.073.373.312.4
A3 (Ours)13.595.0100.060.083.384.617.2

Robustness Under State Perturbations

Method Original Masking Gaussian Blur
50% 55% 60% k=11 k=13 k=15
Avg. ↑ Len. Avg. ↑ Len. Avg. ↑ Len. Avg. ↑ Len. Avg. ↑ Len. Avg. ↑ Len. Avg. ↑ Len.
Original 97.9 6.3 83.2 6.3 66.6 6.3 40.8 6.3 92.6 6.3 79.6 6.3 55.8 6.3
A3 (Ours) 98.1 9.7 89.0 8.9 72.6 7.5 51.0 6.3 96.8 9.6 89.6 8.9 66.0 8.1
Δ vs. Original +0.2 +3.4 +5.8 +2.6 +6.0 +1.2 +10.2 0 +4.2 +3.3 +10.0 +2.6 +10.2 +1.8
A3 benchmark results across horizons and forward calls
Benchmark result summary. A3 matches or improves success while reducing forward calls through adaptive committed horizons across LIBERO, MetaWorld, and ManiSkill.

Visualizations

Execution horizon visualizations across real-world tasks
A3 shortens the committed horizon around contact-rich stages such as grasping, placing, and precise alignment, while using longer horizons during free-space motion.

BibTeX

@article{chen2026dynamic,
  title={Dynamic Execution Commitment of Vision-Language-Action Models},
  author={Chen, Feng and Wang, Xianghui and Chen, Yuxuan and Li, Boying and He, Yefei and Zhang, Zeyu and Wu, Yicheng},
  journal={arXiv preprint arXiv:2605.11567},
  year={2026}
}