Construction and Validation of Action-Conditioned VideoGPT

TABATA Koudai, KAMOHARA Junnosuke, UNNO Ryosuke, SATO Makoto, WATANABE Taiju, KUME Taiga, NEGISHI Masahiro, OKADA Ryo, IWASAWA Yusuke, MATSUO Yutaka

doi:10.11517/pjsai.jsai2023.0_1g4os21a02

Bibliographic Information

Other Title

行動条件付けVideoGPTの構築と検証

Description

<p>World models acquire external structure based on observations of the external world and can predict the future states of the external world as it changes with the action of the agent. Recent advances in generative models and language models have contributed to multi-modal world models, which are expected to be applied in various domains, including automated driving and robotics. Video prediction is the field that has made progress in terms of high fidelity and long term prediction, and world models have potential applications for acquiring temporal representations. One example of model architecture that has performed well is a combination of Encoder-Decoder based latent variable model for image reconstruction and auto-regressive model for prediction of latent sequence. In this work, we extend a video prediction model called VideoGPT, which uses VQVAE and Image-GPT by introducing action conditioning. Validation with CARLA and RoboNet showed improved performance compared to the model without conditioning.</p>

Journal

Proceedings of the Annual Conference of JSAI

Proceedings of the Annual Conference of JSAI JSAI2023 (0), 1G4OS21a02-1G4OS21a02, 2023

The Japanese Society for Artificial Intelligence

Details 詳細情報について

CRID: 1390296808221013504

DOI: 10.11517/pjsai.jsai2023.0_1g4os21a02

ISSN: 27587347

Text Lang: ja

Data Source

JaLC

Abstract License Flag: Disallowed

Export

Report a problem