Exit Through the Training Data: A Look into Instance-Attribution Explanations and Efficient Data Deletion in Machine Learning

Jonathan Brophy

The widespread use of machine learning models, coupled with large datasets and increasingly complex models have led to a general lack of understanding for how individual predictions are made. The GDPR has even stated that any individual has a “right to an explanation” from an automated decision if that decision can significantly impact their life. It is perhaps unsurprising then that explainable AI (XAI) has become a very popular research topic over the past several years. This survey looks at one aspect of model interpretability: instance-attribution explanations; these techniques are able to trace the prediction of a test instance back to the training instances that contributed most to its prediction. These technique is used as a means of debugging and improving models and datasets.

Another issue confronting many institutions today is that of data privacy and ownership. Debates surrounding these topics have resulted in legal action; for example, the GDPR states that companies must comply with removing user data upon request. Removing user information from databases is straightforward; however, machine learning models are not inherently designed to accommodate such a request. Practitioners can always retrain a model from scratch with the requested samples removed, but this quickly becomes prohibitively expensive. Thus, The second part of this review focuses on the task of efficient data deletion from machine learning models. In other words, these works are able to remove the effect of one or multiple training instances without having to retrain the model from scratch.

For both research topics, the current landscape of works is provided, including the advantages and disadvantages of each, open research questions, and promising research opportunities. Then, a discussion of how these research subfields are related is presented, and a practical use case of identifying and mitigating bias/misinformation from universal language models is provided; realizing this use case will henceforth require innovation in both of these areas.