- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Automatic Translation feature is available on CiNii Labs
- Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Robustifying Vision Transformer Without Retraining From Scratch Using Attention Based Test-Time Adaptation
-
- KOJIMA Takeshi
- Graduate School of Engineering, The University of Tokyo
-
- IWASAWA Yusuke
- Graduate School of Engineering, The University of Tokyo
-
- MATSUO Yutaka
- Graduate School of Engineering, The University of Tokyo
Description
<p>Vision Transformer (ViT) is becoming more and more popular in the field of image processing. This study aims to improve the robustness against the unknown perturbations without retraining the ViT model from scratch. Since our approach does not alter the training phase, it does not need to repeat computationally heavy pre-training of ViT. Specifically, we use test-time adaptation for this purpose, which corrects its prediction during test-time by itself. We first show the existing test-time adaptation method (Tent), which was only validated for CNN model, is also applicable by proper parameter tuning and gradient clipping. However, we observed Tent sometimes catastrophically fails, especially under severe perturbations. To stabilize the optimization, we propose a new loss function called Attent, which minimizes the distributional differences of the attention entropy between the source and target. Experiments of image classification task on CIFAR-10-C, CIFAR-100-C, and ImageNet-C show that both Tent and Attent are effective on a wide variety of corruptions. The results also show that by combining Attent and Tent, the classification accuracy on corrupted data is further improved.</p>
Journal
-
- Proceedings of the Annual Conference of JSAI
-
Proceedings of the Annual Conference of JSAI JSAI2022 (0), 3S3IS2e04-3S3IS2e04, 2022
The Japanese Society for Artificial Intelligence
- Tweet
Details 詳細情報について
-
- CRID
- 1390855656024656384
-
- ISSN
- 27587347
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
-
- Abstract License Flag
- Disallowed