This paper presents a novel method for generating differentially private tabular datasets for hierarchical data, specifically focusing on origin-destination (O/D) trips. The approach builds upon the TopDown algorithm, a constraint-based mechanism developed by the US Census to incorporate invariant queries into tabular data. O/D hierarchical data refers to datasets representing trips between geographical areas organized in a hierarchical structure (e.g., re- gion → province → city). The proposed method is designed to improve the accuracy of queries covering broader geographical ar- eas, which are derived through aggregation. This feature provides a “zoom-in” effect on the dataset, ensuring that when zoomed back out, the overall picture is preserved. Furthermore, the approach aims to reduce false positive detection. These characteristics can strengthen practitioners’ and decision-makers’ confidence in adopt- ing differential privacy datasets. The main technical contribution of this paper includes a novel TopDown algorithm that employs constrained optimization with Chebyshev distance minimization, with theoretical guarantees on the maximum absolute error. Ad- ditionally, we propose a new integer optimization algorithm that significantly reduces the incidence of false positives. The effective- ness of the proposed approach is validated using real-world and synthetic O/D datasets, demonstrating its ability to generate private data with high utility and a reduced number of false positives. Our experiments focus on O/D datasets with a single trip as a unit of privacy; nevertheless, the proposed approach supports other units of privacy and also applies to any tabular data with a hierarchical structure.
Differentially Private Release of Hierarchical Origin/Destination Data with a TopDown Approach
Boninsegna, Fabrizio
;Silvestri, Francesco
2025
Abstract
This paper presents a novel method for generating differentially private tabular datasets for hierarchical data, specifically focusing on origin-destination (O/D) trips. The approach builds upon the TopDown algorithm, a constraint-based mechanism developed by the US Census to incorporate invariant queries into tabular data. O/D hierarchical data refers to datasets representing trips between geographical areas organized in a hierarchical structure (e.g., re- gion → province → city). The proposed method is designed to improve the accuracy of queries covering broader geographical ar- eas, which are derived through aggregation. This feature provides a “zoom-in” effect on the dataset, ensuring that when zoomed back out, the overall picture is preserved. Furthermore, the approach aims to reduce false positive detection. These characteristics can strengthen practitioners’ and decision-makers’ confidence in adopt- ing differential privacy datasets. The main technical contribution of this paper includes a novel TopDown algorithm that employs constrained optimization with Chebyshev distance minimization, with theoretical guarantees on the maximum absolute error. Ad- ditionally, we propose a new integer optimization algorithm that significantly reduces the incidence of false positives. The effective- ness of the proposed approach is validated using real-world and synthetic O/D datasets, demonstrating its ability to generate private data with high utility and a reduced number of false positives. Our experiments focus on O/D datasets with a single trip as a unit of privacy; nevertheless, the proposed approach supports other units of privacy and also applies to any tabular data with a hierarchical structure.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.