Building in mechanisms to enable computer systems to forget data when users request it was the focus of a recent paper by Columbia University researchers Yinzhi Cao and Junfeng Yang. The paper suggests basic alterations to how user data enters and engages with analytical systems and data streams. Taken together, the raw data, computations, and derived data form a data propagation network known as the data's lineage, according to the paper's authors.
The authors describe a system that prevents cloned data, both source and "calculated," from becoming orphaned from its affiliated permissions by interposing a "summation model" between it and the systems which access it. The learning systems that apply the data do so via these proxies, and if the proxies are revised or deleted, the data itself becomes unavailable, and its replicated iterations into other systems is both unidentifiable and non-viable.
Moreover, the summation methodology does not require a bottom-up rethinking of existing systems. The researchers projected it onto real-world data analysis models, retrofitting 20 to 300 lines of code, accounting for less than 1 percent of the existing system.
From The Stack
View Full Article
Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA
No entries found