Apple, Nvidia, Anthropic used thousands of YouTube videos to train AI

In response to the lawsuits, defendants such as Meta, OpenAI and Bloomberg have argued that their actions constitute fair use. A case against EleutherAI, which initially deleted the books and made them public, was voluntarily dismissed by the plaintiffs.

Litigation in the remaining cases remains in the early stages, leaving unresolved issues around leave and payment. Pile has since been removed from its official download page, but is still available on file sharing services.

“Tech companies are screwed,” said Amy Keller, a consumer protection attorney and partner at the firm DiCello Levitt, which has filed lawsuits on behalf of creators whose work is allegedly taken by AI firms without consent. Theirs.

“People are concerned about the fact that they didn’t have a choice in the matter,” Keller said. “I think that’s what’s really problematic.”

Parroting a parrot

Many creatives feel uncertain about the way forward.

Full-time YouTubers patrol for unauthorized use of their work, regularly filing takedown notices, and some worry it’s only a matter of time before AI can generate content similar to what they do — if not produce direct copies.

Pakman, the creator of The David Pakman Show, saw the power of AI recently while moving on TikTok. He came across a video that was labeled as a Tucker Carlson clip, but when Pakman watched it, he was surprised. It sounded like Carlson, but it was, word for word, what Pakman had said on his YouTube show, right down to the cadence. He was equally alarmed that only one of the video’s commentators seemed to realize it was fake — a voice clone of Carlson reading Pakman’s script.

“That’s going to be a problem,” Pakman said in a YouTube video he made about the forgery. “You can do that with basically anyone.”

EleutherAI co-founder Sid Black wrote on GitHub that he created YouTube Captions using a script. That script downloads subtitles from the YouTube API in the same way that the YouTube viewer’s browser downloads them when watching a video. According to documentation on GitHub, Black used 495 search terms to retrieve videos, including “funny vlogger,” “Einstein,” “black protestant,” “Social Protective Services,” “infowars,” “quantum chromodynamics,” “Ben Shapiro “, “Uighurs”, “fruit growers”, “dessert recipe”, “Nazca lines” and “flat earth”.

Although YouTube’s terms of service prohibit access to its videos by “automated means,” more than 2,000 GitHub users have bookmarked or approved the code.

“There are many ways in which YouTube could prevent this module from working, if that’s what they’re looking for,” machine learning engineer Jonas Depoix wrote in a GitHub discussion, where he published the code Black used to access YouTube subtitles. “That hasn’t happened so far.”

In an email to Proof News, Depoix said he hasn’t used the code since he wrote it as a college student for a project a few years ago and was surprised that people found it useful. He declined to answer questions about YouTube’s rules.

Google spokesman Jack Malon said in an emailed response to a request for comment that the company has taken “actions over the years to prevent abusive and unauthorized scraping.” He did not respond to questions about other companies’ use of the material as training data.

Among the videos used by AI companies are 146 from Einstein the parrot, a channel with nearly 150,000 subscribers. African gray keeper Marcia, who did not want to use her last name for fear of jeopardizing the famous bird’s safety, said at first she thought it was funny to learn that the AI ​​models had swallowed the words of a mimic parrot.

“Who would want to use a parrot’s voice?” Marcia said. “But then, I know he speaks very well. He speaks with my voice. So he’s parroting me, and then HE’s parroting the parrot.”

Once ingested by the AI, the data cannot be unlearned. Marcia was troubled by all the unknown ways in which her bird’s information could be used, including creating a digital duplicate parrot and, she worried, making it cursed.

“We’re treading into uncharted territory,” Marcia said.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top