Stage 1
Reliable Action Chunk Extension
A progressive reliability sweep rewards successful rollouts that sustain longer executable action chunks, pushing the trustworthy prediction frontier toward the usable chunk limit.
ECCV 2026
Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models
Vision-Language-Action (VLA) models provide a unified paradigm for robotic manipulation, yet their real-world deployment is often bottlenecked by execution efficiency. While existing efforts predominantly focus on compute-centric efficiency to reduce per-step inference latency, the intrinsic policy efficiency of these models remains largely unexplored. Policy efficiency is fundamentally affected by two factors: the effective executable length of predicted action chunks and the total physical steps required to complete a task.
We observe that current VLA policies struggle with planning unreliability and action redundancy, suffering from severe prediction degradation at the tail of action chunks and tending to generate unnecessarily redundant physical steps. To address this, we propose PolicyTrim, a reinforcement learning-based post-training framework that extends the reliable action chunk length and reduces redundant physical steps via dynamic horizon exploration and a redundancy-aware step-saving reward.
Extensive experiments across three benchmarks and three VLA models demonstrate that PolicyTrim improves action chunk utilization by 3× and reduces physical execution steps by 51.4%. Ultimately, our framework delivers up to a 5.83× end-to-end deployment speedup without compromising task success rates.
Action chunk utilization improvement
Physical execution step reduction
End-to-end deployment speedup
Stage 1
A progressive reliability sweep rewards successful rollouts that sustain longer executable action chunks, pushing the trustworthy prediction frontier toward the usable chunk limit.
Stage 2
A step-saving reward favors successful task completions with fewer physical steps, while stability regularization discourages unreproducible shortcuts.
Average success rate (SR), total physical steps (Stotal), average action chunk execution length (hchunk), and end-to-end speedup (Spd↑) across four LIBERO subsets.
| Task | Method | π₀.₅ | OpenVLA-OFT | GR00T | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SR | Stotal | hchunk | Spd↑ | SR | Stotal | hchunk | Spd↑ | SR | Stotal | hchunk | Spd↑ | ||
| Spatial | Baseline | 97.8 | 108.3 | 5 | 1.0 | 98.6 | 111.2 | 8 | 1.0 | 91.4 | 67.2 | 5 | 1.0 |
| PolicyTrim | 97.8 | 59.8 | 15 | 5.43× | 98.8 | 62.1 | 8 | 1.79× | 92.0 | 56.6 | 10 | 2.37× | |
| Object | Baseline | 99.1 | 125.0 | 5 | 1.0 | 98.5 | 135.2 | 8 | 1.0 | 95.0 | 71.3 | 5 | 1.0 |
| PolicyTrim | 98.5 | 64.3 | 15 | 5.83× | 98.5 | 68.8 | 8 | 1.97× | 95.3 | 65.5 | 10 | 2.18× | |
| Goal | Baseline | 98.7 | 110.6 | 5 | 1.0 | 97.7 | 118.6 | 8 | 1.0 | 84.2 | 63.3 | 5 | 1.0 |
| PolicyTrim | 98.8 | 63.5 | 15 | 5.23× | 98.0 | 66.9 | 8 | 1.77× | 86.3 | 60.8 | 10 | 2.08× | |
| Long | Baseline | 93.0 | 249.8 | 5 | 1.0 | 92.9 | 249.3 | 8 | 1.0 | 86.1 | 177.9 | 5 | 1.0 |
| PolicyTrim | 93.3 | 171.8 | 10 | 2.91× | 93.1 | 178.3 | 8 | 1.40× | 89.2 | 165.9 | 10 | 2.14× | |
PolicyTrim improves both success rates and step efficiency across benchmarks, reaching up to 2.52× speedup on Meta-World and 2.36× on ManiSkill.
| Benchmark | Method | π₀.₅ | OpenVLA-OFT | ||||||
|---|---|---|---|---|---|---|---|---|---|
| SR | Stotal | hchunk | Spd↑ | SR | Stotal | hchunk | Spd↑ | ||
| ManiSkill | Baseline | 88.1 | 45.2 | 5 | 1.0 | 60.6 | 53.1 | 8 | 1.0 |
| PolicyTrim | 89.8 | 38.3 | 10 | 2.36× | 63.2 | 46.7 | 8 | 1.14× | |
| Meta-World | Baseline | 65.1 | 66.3 | 5 | 1.0 | not evaluated | |||
| PolicyTrim | 65.4 | 52.6 | 10 | 2.52× | |||||
PolicyTrim transfers its efficiency gains to physical deployment on an Agilex Piper arm, maintaining or improving success rates while achieving 1.86× average wall-clock speedup under the standard real-world setting.
| Method | Std. SR | Dyn. SR | Time | |||||
|---|---|---|---|---|---|---|---|---|
| Flip | Hang | Tape | Flip | Tape | Flip | Hang | Tape | |
| Baseline | 70 | 60 | 95 | 70 | 65 | 14.6 | 15.6 | 17.5 |
| PolicyTrim | 75 | 65 | 95 | 70 | 70 | 7.6 | 8.7 | 9.4 |
PolicyTrim also generalizes beyond the standard OpenVLA-OFT setting, improving both re-pretrained parallel-decoding OpenVLA-OFT and autoregressive OpenVLA.
| Model | Method | SR | Step | h | Spd↑ |
|---|---|---|---|---|---|
| OpenVLA-OFT | Baseline | 98.6 | 111.2 | 8 | 1.00× |
| OpenVLA-OFT | S1+S2 | 98.8 | 65.4 | 14 | 2.97× |
| OpenVLA | Baseline | 84.7 | 113.5 | — | 1.00× |
| OpenVLA | S2 | 87.0 | 80.6 | — | 1.41× |
Baseline: 14.3s | PolicyTrim: 6.5s
Baseline: 15.1s | PolicyTrim: 7.4s
Baseline: 18.6s | PolicyTrim: 7.5s
Real-world rollout comparison under the same task setting. The success marker appears at the completion timestamp encoded in each video filename.
@inproceedings{policytrim2026,
title = {PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models},
author = {Xianghui Wang and Feng Chen and Wenbo Zhang and Hua Yan and Zixuan Wang and Changsheng Li and Yinjie Lei},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2026}
}