A Randomized Response Layer for Ensuring User Privacy in Synthetic Data Generation
Abstract
This paper introduces a modification to the Variational Auto Encoder by incorporating a randomized response (RR) layer before feeding the raw data into the encoder. This modification allows us to create a synthetic dataset that ensures user privacy. We generate synthetic data from four open datasets and compare it with well-known synthetic approaches. Our evaluation focuses on computing utility using the average variant distance, which quantifies the disparity between the real and synthetic data in terms of joint distributions on real and synthetic datasets. Additionally, for evaluating privacy, we measure each row in the synthetic data is novel using a tool developed by SDV. Our proposed method demonstrates similar AVD values compared to the no-privacy methods and higher AVD values compared to privacy method.
This paper introduces a modification to the Variational Auto Encoder by incorporating a randomized response (RR) layer before feeding the raw data into the encoder. This modification allows us to create a synthetic dataset that ensures user privacy. We generate synthetic data from four open datasets and compare it with well-known synthetic approaches. Our evaluation focuses on computing utility using the average variant distance, which quantifies the disparity between the real and synthetic data in terms of joint distributions on real and synthetic datasets. Additionally, for evaluating privacy, we measure each row in the synthetic data is novel using a tool developed by SDV. Our proposed method demonstrates similar AVD values compared to the no-privacy methods and higher AVD values compared to privacy method.
Journal
-
- コンピュータセキュリティシンポジウム2023論文集
-
コンピュータセキュリティシンポジウム2023論文集 1397-1404, 2023-10-23
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1050297969507822720
-
- Web Site
- http://id.nii.ac.jp/1001/00228693/
-
- Text Lang
- en
-
- Article Type
- conference paper
-
- Data Source
-
- IRDB