Realtime-VLA V2

Dexmal による論文 “Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate”（arXiv:2603.26360, 2026）。V1（realtime-vla / arXiv:2510.26742）がGPU 計算の高速化に集中したのに対し、V2 は実ロボット上でのエンドツーエンド高速実行を扱う。デモ（https://dexmal.github.io/realtime-vla-v2/）では人間の casual 操作と同等速度で shirt-folding 等を実行。

核となる主張

デモより速い実行を Fast / Smooth / Accurate 同時達成で解く。貢献:

faster-than-demonstration 実行の中核課題の同定と実装フレームワーク。
経験データを使った throughput の学習的最適化（Human-in-the-loop 速度変調）。
VLA 行動実行速度の上限解析（motion bounded / control bounded）。

技術スタック

遅延キャリブレーション: t_camera, t_readout, t_proprio, t_motion を計測し入力整合とコマンドのプリアンプリファイで補償。
軌道後処理: Speed Adaptation Model → Temporal Optimization（osqp QP）→ Spatial Optimization（acados MPC, SQP-RTI）。
速度適応学習: オペレータのスロットル入力を回帰モデルに蒸留。

実機

DOS W1 系（RealSense D435 + Airbot Play）。評価タスク: shirt-folding / place-into-fixture（0.2 mm マージン）/ pick-and-latch。

Quartz 5

Explorer

Realtime-VLA V2（Dexmal）

Realtime-VLA V2

核となる主張

技術スタック

実機

関連

Graph View

Table of Contents

Backlinks