That secret weapon is “differential privacy,” a novel field of data science that focuses on carefully adding random noise to an individual user’s information before it’s uploaded to the cloud. That way, a company such as Apple’s total dataset reveals meaningful results without any one person’s secrets being spilled.
Researchers at the University of Southern California, Indiana University, and China’s Tsinghua University have dug into the code of Apple’s MacOS and iOS operating systems to reverse-engineer just how the company’s devices implement differential privacy in practice. They’ve examined how Apple’s software injects random noise into personal information—ranging from emoji usage to your browsing history to HealthKit data to search queries—before your iPhone or MacBook upload that data to Apple’s servers.
But differential privacy’s effectiveness depends on a variable known as the “privacy loss parameter,” or “epsilon,” which determines just how much specificity a data collector is willing to sacrifice for the sake of protecting its users’ secrets. By taking apart Apple’s software to determine the epsilon the company chose, the researchers found that MacOS uploads significantly more specific data than the typical differential privacy researcher might consider private. iOS 10 uploads even more.