A3 | Dynamic Execution Commitment of Vision-Language-Action Models

Dynamic Execution Commitment of Vision-Language-Action Models

A³: Adaptive Action Acceptance for VLA Execution

Feng Chen^1*, Xianghui Wang^2*, Yuxuan Chen³, Boying Li⁴, Yefei He⁵, Zeyu Zhang⁵, Yicheng Wu⁶

¹University of Adelaide ²Sichuan University ³Shanghai Jiao Tong University ⁴Monash University ⁵Zhejiang University ⁶Imperial College London

* Equal contribution

Abstract

Vision-Language-Action models often predict a short chunk of future actions in one forward pass, but deciding how many actions to execute before replanning remains a brittle fixed-horizon choice. A³ introduces Adaptive Action Acceptance, a self-speculative prefix verification mechanism for dynamic execution commitment. A³ samples candidate action chunks, estimates trajectory-wise consensus, and verifies the selected draft with two constraints: consensus-ordered conditional invariance and prefix-closed sequential consistency. The final execution horizon emerges as the longest verified prefix, eliminating manual horizon tuning while preserving the trade-off between execution robustness and inference throughput.

Method

Overview of A³. Candidate action chunks are sampled, mapped into induced trajectory space, scored for consensus, and verified through dual hierarchical constraints to determine the executable prefix.

A³ is organized around three core pieces:

Mode-aware trajectory consensus. Candidate chunks are evaluated in induced trajectory space so consensus reflects rollout-level stability rather than raw per-step action variance.
Dual hierarchical verification. A³ combines consensus-ordered conditional invariance with prefix-closed sequential consistency.
Emergent commitment length. The committed horizon is the longest verified action prefix, produced without task-specific fixed-horizon tuning.

Backbone	Method	LIBERO	MetaWorld	ManiSkill
pi-0	Original	95.1	5.0	78.0	3.0	77.6	5.0
MoH	95.1	5.0	79.4	3.0	78.0	5.0
A³ (Ours)	95.3	9.4	79.2	3.2	78.6	6.2
pi-0.5	Original	97.9	6.3	77.8	3.0	88.1	5.0
MoH	97.7	5.0	78.4	3.3	88.4	5.0
EverydayVLA	97.6	6.8	-	-	-	-
AutoHorizon	96.9	-	-	-	-	-
A³ (Ours)	98.1	9.7	79.4	4.5	89.1	5.2
GR00T	Original	90.1	4.7	-	-	-	-
A³ (Ours)	92.9	4.5	-	-	-	-

Backbone

Method

LIBERO

MetaWorld

ManiSkill

Avg. (%)

Len.

Avg. (%)

Len.

Avg. (%)

Len.

pi-0

Original

95.1

5.0

78.0

3.0

77.6

5.0

MoH

95.1

5.0

79.4

3.0

78.0

5.0

A³ (Ours)

95.3

9.4

79.2

3.2

78.6

6.2

pi-0.5

Original

97.9

6.3

77.8

3.0

88.1

5.0

MoH

97.7

5.0

78.4

3.3

88.4

5.0

EverydayVLA

97.6

6.8

AutoHorizon

96.9

A³ (Ours)

98.1

9.7

79.4

4.5

89.1

5.2

GR00T

Original

90.1

4.7

A³ (Ours)

92.9

4.5

Setting	Exec Horizon	FlipMug	TapeBox	HangMug	StackCube	Avg. Success	Inference Calls
Fixed	5	50.0	35.0	25.0	0.0	27.5	91.5
Fixed	10	70.0	100.0	35.0	66.7	67.9	32.7
Fixed	15	95.0	100.0	35.0	86.7	79.2	17.7
Fixed	20	90.0	95.0	35.0	73.3	73.3	12.4
A³ (Ours)	13.5	95.0	100.0	60.0	83.3	84.6	17.2

Setting

Exec Horizon

FlipMug

TapeBox

HangMug

StackCube

Avg. Success

Inference Calls

Fixed

50.0

35.0

25.0

0.0

27.5

91.5

Fixed

70.0

100.0

35.0

66.7

67.9

32.7

Fixed

95.0

100.0

35.0

86.7

79.2

17.7

Fixed

90.0

95.0

35.0

73.3

12.4

A³ (Ours)

13.5

95.0

100.0

60.0

83.3

84.6

17.2

Method	Original	Masking	Gaussian Blur
Original	97.9	6.3	83.2	6.3	66.6	6.3	40.8	6.3	92.6	6.3	79.6	6.3	55.8	6.3
A³ (Ours)	98.1	9.7	89.0	8.9	72.6	7.5	51.0	6.3	96.8	9.6	89.6	8.9	66.0	8.1
Δ vs. Original	+0.2	+3.4	+5.8	+2.6	+6.0	+1.2	+10.2	0	+4.2	+3.3	+10.0	+2.6	+10.2	+1.8

Method

Original

Masking

Gaussian Blur

50%

55%

60%

k=11

k=13

k=15

Avg. ↑

Len.

Avg. ↑

Len.

Avg. ↑

Len.

Avg. ↑

Len.

Avg. ↑

Len.

Avg. ↑

Len.

Avg. ↑

Len.

Original

97.9

6.3

83.2

6.3

66.6

6.3

40.8

6.3

92.6

6.3

79.6

6.3

55.8

6.3

A³ (Ours)

98.1

9.7

89.0

8.9

72.6

7.5

51.0

6.3

96.8

9.6

89.6

8.9

66.0

8.1

Δ vs. Original

+0.2

+3.4

+5.8

+2.6

+6.0

+1.2

+10.2

+4.2

+3.3

+10.0

+2.6

+10.2

+1.8

BibTeX

@article{chen2026dynamic, title={Dynamic Execution Commitment of Vision-Language-Action Models}, author={Chen, Feng and Wang, Xianghui and Chen, Yuxuan and Li, Boying and He, Yefei and Zhang, Zeyu and Wu, Yicheng}, journal={arXiv preprint arXiv:2605.11567}, year={2026} }

Dynamic Execution Commitment of Vision-Language-Action Models

Abstract

Method

Results

Dynamic Horizon Comparison

Real-world Manipulation

Robustness Under State Perturbations

Visualizations

BibTeX