If you've ever uploaded photos or art, written a review, "liked" content, answered a question on Reddit, contributed to open source code, or done any number of other activities online, you've done free work for tech companies, because downloading all this content from the web is how their AI systems learn about the world.
Tech companies know this, but they mask your contributions to their products with technical terms like "training data," "unsupervised learning," and "data exhaust" (and, of course, impenetrable "Terms of Use" documents). In fact, much of the innovation in AI over the past few years has been in ways to use more and more of your content for free. This is true for search engines like Google, social media sites like Instagram, AI research startups like OpenAI, and many other providers of intelligent technologies.
This exploitative dynamic is particularly damaging when it comes to the new wave of generative AI programs like Dall-E and ChatGPT. Without your content, ChatGPT and all of its ilk simply would not exist. Many AI researchers think that your content is actually more important than what computer scientists are doing. Yet these intelligent technologies that exploit your labor are the very same technologies that are threatening to put you out of a job. It's as if the AI system were going into your factory and stealing your machine.
But this dynamic also means that the users who generate data have a lot of power. Discussions over the use of sophisticated AI technologies often come from a place of powerlessness and the stance that AI companies will do what they want, and there's little the public can do to shift the technology in a different direction. We are AI researchers, and our research suggests the public has a tremendous amount of "data leverage" that can be used to create an AI ecosystem that both generates amazing new technologies and shares the benefits of those technologies fairly with the people who created them.
From Wired
View Full Article
No entries found