Cross-Silo LDP De-Anonymization
Abstract
When a persons records appear in k independent data silos, each protected by ( _, _ )- differential privacy, standard composition yields a valid ( _k, k_ )-DP guarantee for the joint output. This worst-case bound, however, does not answer the concrete inference question: _at what k can an adversary actually identify a target person?_ This paper develops the informationtheoretic framework needed to answer that question. We introduce _cross-silo person-level DP_ (XSP-DP), a Pufferfish-style privacy notion whose adjacency relation captures all records of a single person across all silos simultaneously, and verify that the standard basic composition bound ([] _i[][i][,]_[ ] _i[][i]_[)-DP][carries][over][to][this][adjacency] model. Within this framework we prove that de-anonymization undergoes a phase transition at _k[]_ = (log _n/_[2] ) (population size _n_ , per-silo RR parameter __ ): a Fano lower bound shows any estimator fails for _k k[]_ , while a matching maximum-likelihood upper bound shows the attack succeeds for _k k[]_ . An explicit XOR + randomized-response construction demonstrates information synergy: each silos output is individually uninformative about the target ( _I_ ( _Z_ ; _Yi_ ) = 0), yet the joint mutual information is strictly positive. For non-coordinated binary randomized-response mechanisms, we prove that de-anonymization is inevitable once _k_ exceeds the threshold, establishing that cross-silo coordination is necessary. These results provide a baseline threat model and -level threshold for cross-silo inference attacks under local DP. Sharp constants, second-order thresholds, and spectral characterizations of the phase transition are developed in a companion paper; coordinated defense protocols and their system-level guarantees are treated separately.