Now that machines can learn, can they unlearn?

Machine learning is used by companies of all sizes to understand people's likes and dislikes. Researchers are asking a different question. How can machines forget?
Machine unlearning, a new area in computer science, seeks to induce selective amnesia using artificial intelligence software. This is a way to erase all evidence of a specific person or data point from a machine-learning system without affecting its performance.

The concept, if made practical could allow people to have more control over their data as well as the value they derive from it. While users have the right to ask companies to delete their personal data, most are unaware of what algorithms they helped train or tune. Machine learning could allow a person to remove both their data as well as the company's potential profit.

Anyone who has ever regretted sharing online content will understand that artificial amnesia is not a new concept in computer science. Machine-learning algorithms are often more efficient than human coders and can recognize faces and rank social posts. Companies invest millions to train them. A machine-learning system can't be altered or understood once it is trained. Rebuilding a system from scratch is the best way to eliminate the influence of one data point. This can be costly. Aaron Roth, a professor at Penn who studies machine unlearning, said that this research seeks to find a middle ground. Is it possible to remove any influence on someone's data by asking them to delete it but not incur the full costs of retraining.

Advertisement

Machine unlearning is motivated partly by the growing awareness of privacy-eroding artificial intelligence. Companies can be forced to delete any information that they have obtained from data regulators all over the globe. California and the EU have different rights. Citizens can request that companies delete their data if there is a change in mind about the information they shared. Recent US and European regulators stated that the owners of AI systems should sometimes delete any system that was trained with sensitive data.

The UK's data regulator warned companies last year that machine-learning software could fall under GDPR rights, such as data deletion. This is because AI systems can contain personal information. Security researchers have found that algorithms may sometimes leak sensitive data. The US Federal Trade Commission ordered Paravision, a facial recognition startup, to delete a set of unauthorised face photos and the machine-learning algorithms that were trained with them. Rohit Chopra, FTC Commissioner, praised this new enforcement tactic for forcing a company to break data rules and forfeit its gains.

Machine unlearning research is a small area that addresses some of the mathematical and practical questions that these regulatory shifts raise. Although researchers have demonstrated that machine-learning algorithms can be made to forget in certain circumstances, the technology is still not ready for primetime. Roth says that there is a gap in the knowledge and abilities of this field. This is a common problem for young fields.

Researchers from Wisconsin-Madison and Toronto have proposed a promising method for 2019: segregating the source data to create multiple machine-learning projects. Each piece is then processed individually before the final machine learning model is created. Only a fraction of the original input data must be reprocessed if one data point is lost. This approach worked well with data from online purchases and a collection that included more than one million photos.

Advertisement

Roth and his collaborators at Stanford, Harvard, and Penn recently showed a flaw to this approach. They demonstrated that the unlearning system would fail if deleted requests were submitted in a specific sequence. This could be either by chance or malicious actors. They also demonstrated how to mitigate the problem.

Gautam Kamath is a University of Waterloo professor who also works on unlearning. He says that the problem the project discovered and solved is just one example of many questions about machine unlearning beyond a laboratory curiosity. His research group is currently investigating how accurate a system can be reduced by having it successively unlearn multiple data point.

Kamath also wants to find ways for companies to prove to regulators that a system has not forgotten what it was supposed unlearn. He says that although it feels like they are still a ways off, maybe they will eventually have auditors to do this type of thing.

The FTC and other regulatory agencies will be looking at machine unlearning more closely as they consider the potential. Reuben Binns is an Oxford University professor who studies data protection. He says that the idea that individuals should be able to control the fate and fruit of their data has increased in the US and Europe in recent years.

Tech companies will need to do a lot of technical work before they can implement machine learning as a way for people to have more control over their data's algorithmic fate. The privacy risks associated with the AI age might not be addressed even if the technology is.

Differential privacy is a clever way to put mathematical limits on how much information a system can leak about someone. It provides a useful comparison. Although Apple, Microsoft, Google and Microsoft love the technology, it is rarely used and privacy risks are still numerous.

Binns believes that machine learning can be useful and even beneficial in some cases, but it is more of a marketing tool for companies to demonstrate their technical ability than a significant shift in data security. Even if machines can forget, users must be cautious about who they share their data with.

This story first appeared on wired.com