My deepfake shows how valuable our data is in the age of AI

And it’s scary how little control we have over how it’s used.

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Deepfakes are getting good. Like, really good. Earlier this month I went to a studio in East London to get myself digitally cloned by the AI video startup Synthesia. They made a hyperrealistic deepfake that looked and sounded just like me, with realistic intonation. It is a long way away from the glitchiness of earlier generations of AI avatars. The end result was mind-blowing. It could easily fool someone who doesn’t know me well.

Synthesia has managed to create AI avatars that are remarkably humanlike after only one year of tinkering with the latest generation of generative AI. It’s equally exciting and daunting thinking about where this technology is going. It will soon be very difficult to differentiate between what is real and what is not, and this is a particularly acute threat given the record number of elections happening around the world this year.

We are not ready for what is coming. If people become too skeptical about the content they see, they might stop believing in anything at all, which could enable bad actors to take advantage of this trust vacuum and lie about the authenticity of real content. Researchers have called this the “liar’s dividend.” They warn that politicians, for example, could claim that genuinely incriminating information was fake or created using AI.

I just published a story on my deepfake creation experience, and on the big questions about a world where we increasingly can’t tell what’s real. Read it here.

But there is another big question: What happens to our data once we submit it to AI companies? Synthesia says it does not sell the data it collects from actors and customers, although it does release some of it for academic research purposes. The company uses avatars for three years, at which point actors are asked if they want to renew their contracts. If so, they come into the studio to make a new avatar. If not, the company deletes their data.

But other companies are not that transparent about their intentions. As my colleague Eileen Guo reported last year, companies such as Meta license actors’ data—including their faces and  expressions—in a way that allows the companies to do whatever they want with it. Actors are paid a small up-front fee, but their likeness can then be used to train AI models in perpetuity without their knowledge.

Even if contracts for data are transparent, they don’t apply if you die, says Carl Öhman, an assistant professor at Uppsala University who has studied the online data left by deceased people and is the author of a new book, The Afterlife of Data. The data we input into social media platforms or AI models might end up benefiting companies and living on long after we’re gone.

“Facebook is projected to host, within the next couple of decades, a couple of billion dead profiles,” Öhman says. “They’re not really commercially viable. Dead people don’t click on any ads, but they take up server space nevertheless,” he adds. This data could be used to train new AI models, or to make inferences about the descendants of those deceased users. The whole model of data and consent with AI presumes that both the data subject and the company will live on forever, Öhman says.

Our data is a hot commodity. AI language models are trained by indiscriminately scraping the web, and that also includes our personal data. A couple of years ago I tested to see if GPT-3, the predecessor of the language model powering ChatGPT, has anything on me. It struggled, but I found that I was able to retrieve personal information about MIT Technology Review’s editor in chief, Mat Honan.

High-quality, human-written data is crucial to training the next generation of powerful AI models, and we are on the verge of running out of free online training data. That’s why AI companies are racing to strike deals with news organizations and publishers to access their data treasure chests.

Old social media sites are also a potential gold mine: when companies go out of business or platforms stop being popular, their assets, including users’ data, get sold to the highest bidder, says Öhman.

“MySpace data has been bought and sold multiple times since MySpace crashed. And something similar may well happen to Synthesia, or X, or TikTok,” he says.

Some people may not care much about what happens to their data, says Öhman. But securing exclusive access to high-quality data helps cement the monopoly position of large corporations, and that harms us all. This is something we need to grapple with as a society, he adds.

Synthesia said it will delete my avatar after my experiment, but the whole experience did make me think of all the cringeworthy photos and posts that haunt me on Facebook and other social media platforms. I think it’s time for a purge.

Melissa Heikkilä


Artificial intelligence
رابط المقال الأصلي