Data analysis revolves around the central goal of aggregating metrics. Aggregation must be done covertly when data points match personal information, such as the records or activities of specific users. Differential privacy (DP) is a method that limits the influence of each data point on the conclusion of a calculation. It has therefore become the most commonly recognized approach to individual privacy.
Although various private algorithms are theoretically possible, they tend to be less efficient and accurate in practice than their non-private counterparts. In particular, the differential privacy requirement is a worst-case requirement. It forces the confidentiality requirement to apply to any two adjacent data sets, regardless of how they are constructed, even if they are not sampled from any distribution, resulting in a significant loss of accuracy. This means that “unlikely points” that have a large impact on aggregation should be taken into account in the privacy analysis.
Recent research by Google and Tel Aviv University provides a common framework for pre-processing data to ensure data friendliness. When the data is known to be “friendly”, the private aggregation step can be performed without considering potentially influential “unfriendly” elements. Since the aggregation stage is no longer constrained to operate in the initial “worst case” setting, the proposed method has the potential to significantly reduce the amount of noise introduced at this stage.
Initially, researchers formally define the conditions under which a data set can be considered friendly. These conditions will vary depending on the type of aggregation required, but will always include data sets for which aggregation sensitivity is low. For example, if the amount is average, “friendly” should include compact data sets.
The team developed the FriendlyCore filter, which reliably extracts a significant friendly subset (the core) from the input. The algorithm is designed to meet two criteria:
- It should eliminate outliers to keep only elements close to many others in the core.
- For close data sets that differ by one element, the filter outputs all elements except y with nearly equal probability. Kernels obtained from these nearby databases can be merged together.
The team then created the Friendly DP algorithm, which by introducing less noise into the sum meets the less strict definition of privacy. By applying a bona fide DP aggregation method to the kernel generated by a filter satisfying the aforementioned conditions, the team proved that the resulting composition is differentially private in the conventional sense. Clustering and detection of the covariance matrix of a Gaussian distribution are additional applications of this aggregation approach.
The researchers used the Zero Concentrated Differential Privacy (zCDP) model to test the efficacy of FriendlyCore-based algorithms. 800 samples were drawn from a Gaussian distribution with an unknown mean across their steps. As a benchmark, the researchers looked at how it stacks up against the CoinPress algorithm. CoinPress, unlike FriendlyCore, requires a norm of the average upper bound on R. The proposed method is independent of the upper bound parameters and dimensions and therefore outperforms CoinPress.
The team also evaluated the efficacy of their proprietary k-means clustering technology by comparing it to another location-sensitive recursive hashing technique, LSH clustering. Each experiment was repeated 30 times. FriendlyCore often fails and gives inaccurate results for small values of n (the number of samples from the mixture). Yet, as n grows, the proposed technique becomes more likely to succeed (as the tuples created get closer to each other), producing very accurate results, while LSH-clustering lags behind. Even without clear clustering, FriendlyCore performs well on huge datasets.
Take a look Paper and Reference article. All credit for this research goes to the researchers on this project. Also, don’t forget to join our 14k+ ML SubReddit, Discord Channeland Email newsletterwhere we share the latest news on AI research, cool AI projects, and more.
Tanushree Shenwai is a Consultant Intern at MarktechPost. She is currently pursuing a Bachelor of Technology degree from the Indian Institute of Technology (IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of Artificial Intelligence in various fields. She is passionate about researching new advancements in technology and their real-life applications.