[Preprint]
We released a preprint of our work StreamVLN, a streaming VLN framework that employs a hybrid slow-fast context modeling strategy to support multi-modal reasoning over interleaved vision, language and action inputs. Paper
Project
Video
Code
Data