物理的に実現可能な特徴をトリガーとしたクリーンラベルバックドア攻撃

大磯, 秀幸, 福地, 一斗, 秋本, 洋平, 佐久間, 淳

機械学習モデルのリスク評価を行うために，様々なバックドア攻撃手法が検討されている．訓練時にバックドア攻撃を受けたモデルはテスト時にトリガー（例：ノイズ，画素パターン）を含むサンプルが入力された場合，そのサンプルを特定のクラスへと誤予測する．近年では，デジタル上での攻撃だけでなく物理世界での攻撃を想定するために，ノイズや画素パターンなどの人工物ではなく自然物をトリガーのモチーフとした手法が注目されている．しかし，既存手法は「画像の反射具合をシミュレートしたトリガー」のように，画像の見た目が自然になるようにシミュレートした物理現象をトリガーとして用いるため，画像の見た目は自然であるが，トリガーの付加はデジタル上で行う必要があり，物理的に実現可能な攻撃ではない．そこで，本研究では，高品質な生成モデルを利用してテスト時において物理的に実現可能な特徴をトリガーとしたステルス性が高いバックドア攻撃手法を提案する．提案手法では，例えば，「眼鏡の有無」，「笑顔かどうか」，「車の色」のように，特定の事象に限らず，様々な自然な画像特徴を選択してトリガーとして利用することが可能である．評価実験によって，提案手法による攻撃の有効性を評価する．

Backdoor attack methods are investigated to assess the risk of machine learning models. The backdoored model behaves to misclassify an input to a target class when input contains a trigger (e.g., noise, pixel pattern) during the test phase. In recent years, methods that use natural objects as trigger motifs instead of artificial objects, such as noise and pixel patterns, have attracted attention to attack methods in the digital and physical worlds. Existing methods use physical phenomena simulated to look natural, such as "triggers simulating the reflection of an image," as triggers so that the image's appearance is natural. However, adding triggers must be done digitally, which is not a physically realizable attack. Therefore, we propose a stealthy backdoor attack method triggered by physically realizable features using a high-quality generative model. For example, the proposed method can select various natural image features as triggers, such as "eyeglasses," "smile or not," and "car color." We evaluate the proposed effectiveness through evaluation experiments.

物理的に実現可能な特徴をトリガーとしたクリーンラベルバックドア攻撃

書誌事項

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

物理的に実現可能な特徴をトリガーとしたクリーンラベルバックドア攻撃

書誌事項

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について