合成データ生成の出力を評価するメンバーシップ推論攻撃フレームワーク

三浦, 尭之, 紀伊, 真昇, 市川, 敦謙, 岩花, 一輝, 芝原, 俊樹, 奥田, 哲矢, 山本, 充子, 矢内, 直人, Takayuki, Miura, Masanobu, Kii, Atsunori, Ichikawa, Kazuki, Iwahana, Toshiki, Shibahara, Tetsuya, Okuda, Juko, Yamamoto, Naoto, Yanai

合成データ生成技術のプライバシー脅威として，特定個人の元データへの所属の有無を推論するメンバーシップ推論攻撃がある．本稿では，訓練済みモデルは使用せず，出力された合成データからどの程度のプライバシー漏洩が起きうるのかを評価するフレームワークを提案した．また，そのフレームワークの下，元データの統計的な情報に着目した具体的な攻撃手法も構成した．さらに，統計量，ベイジアンネットワーク，あるいはニューラルネットワークを用いた合成データ生成で得られた出力を用いて，元データに対するメンバーシップ推論攻撃を公開データセットを用いた実験で検証した．実験の結果，ランダムな攻撃成功率が 0.5 に対して，ターゲットサンプルを統計的な外れ値になるように選ぶと，いくつかの方式では正答率が 0.7～0.96 に上昇しメンバーシップ推論攻撃の精度が向上することが確認できた．また，合成手法ごとにメンバーシップ推論がしやすい攻撃方法が異なることも明らかにした．

A membership inference attack, which infers whether or not a particular individual belongs to the original data, is known as a privacy threat to synthetic data generation. In this paper, we propose a framework to evaluate privacy leakage from output of synthetic data without using a trained model. We also demonstrate a concrete attack method that focuses on statistical information of the original data under the framework. Furthermore, we conduct experiments on public datasets to verify membership inference attacks on data by synthetic data generation using statistics, Bayesian networks, or neural networks on the original data. As a result, when a target sample is selected as a statistical outlier, in contrast to an attack success rate of 0.5 in a random way, the attack success rate increases from 0.7 to 0.96 for several methods. Thereby, we confirm that the accuracy of the membership inference attack could be improved. We also found that the results of membership inference attacks are different for each synthetic method.

合成データ生成の出力を評価するメンバーシップ推論攻撃フレームワーク

書誌事項

説明

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

合成データ生成の出力を評価するメンバーシップ推論攻撃フレームワーク

書誌事項

説明

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について