A Randomized Response Layer for Ensuring User Privacy in Synthetic Data Generation

抄録

This paper introduces a modification to the Variational Auto Encoder by incorporating a randomized response (RR) layer before feeding the raw data into the encoder. This modification allows us to create a synthetic dataset that ensures user privacy. We generate synthetic data from four open datasets and compare it with well-known synthetic approaches. Our evaluation focuses on computing utility using the average variant distance, which quantifies the disparity between the real and synthetic data in terms of joint distributions on real and synthetic datasets. Additionally, for evaluating privacy, we measure each row in the synthetic data is novel using a tool developed by SDV. Our proposed method demonstrates similar AVD values compared to the no-privacy methods and higher AVD values compared to privacy method.

This paper introduces a modification to the Variational Auto Encoder by incorporating a randomized response (RR) layer before feeding the raw data into the encoder. This modification allows us to create a synthetic dataset that ensures user privacy. We generate synthetic data from four open datasets and compare it with well-known synthetic approaches. Our evaluation focuses on computing utility using the average variant distance, which quantifies the disparity between the real and synthetic data in terms of joint distributions on real and synthetic datasets. Additionally, for evaluating privacy, we measure each row in the synthetic data is novel using a tool developed by SDV. Our proposed method demonstrates similar AVD values compared to the no-privacy methods and higher AVD values compared to privacy method.

収録刊行物

詳細情報 詳細情報について

問題の指摘

ページトップへ