Nvidia is allegedly scraping YouTube, Netflix, and more to train AI

As generative artificial intelligence grows in prevalence, it’s important that companies remain transparent about the data used to train models. Generated content doesn’t materialise from nothing, meaning that informed consent from copyright holders is paramount. This is especially true in light of the scale of scraping by Nvidia to train its ‘Cosmos’ AI.

Cosmos, in this instance, does not refer to Nvidia’s existing product of the same name. Instead, it’s the internal codename for an AI model, as reported by 404 Media. According to emails obtained by the outlet, the goal of the project is to build a video foundation model “that encapsulates simulation of light transport, physics, and intelligence in one place to unlock various downstream applications critical to Nvidia.” In order to train it, the company has been scraping up to a staggering “80 years worth of videos per day.” Unfortunately, this apparently includes platforms with copyrighted material, such as YouTube and Netflix.

Nvidia naturally denies any assertion that its practices are in breach of copyright law, also describing model training as a form of fair use due to its transformative nature. However, many platforms do not allow scraping as part of their terms of service, including Netflix and YouTube. More damningly, though, screenshots allegedly show Nvidia employees taking measures to circumvent scraping protections for the latter in service of Cosmos.

YouTube actively blocks the IP addresses of users running scrapers or mass-downloading tools on the platform. In response to this, Nvidia employees apparently used Amazon Web Services (AWS) to run and restart virtual machines to circumvent these protections. Curiously, this solution came from a member of the company’s Omniverse team.

To be clear, there’s no indication of Comos’ deployment in public or commercial products. However, 404 Media has obtained an alleged chart shared in emails that shows how the model would benefit various products, including GeForce, Omniverse, and others.

I strongly suggest reading the full report on 404 Media for a deeper look into Cosmos and its development. Nvidia has yet to comment publicly on the outlet’s findings outside of comments already provided.

In lieu of any concrete conclusions, instances like this further fuel my general distaste for generative AI. For transparency, I wholly entertain uses akin to Google’s Magic Eraser, a feature I use frequently on my Pixel smartphone. I also see no problem with DLSS Frame Generation. In terms of generating content wholesale, though, I’m far less comfortable, and I don’t see that position changing for the foreseeable future.

Browse

The Club

Nvidia is allegedly scraping YouTube, Netflix, and more to train AI

Share

Deal of the Day

Intel Core Ultra 270K Plus crashes to record-low price for Prime Day, in both US and UK

Hot Reviews

AOC Agon Pro AG276QSG2 review: a more affordable Nvidia G-Sync Pulsar monitor

PCSpecialist Storm Elite II review: a great QHD gaming PC at a fair price

Ugreen NASync DXP4800 GT review: a stylish quad-bay NAS

Preferred Partners

Related Reading

AOC Agon Pro AG276QSG2 review: a more affordable Nvidia G-Sync Pulsar monitor

Nvidia GeForce RTX 5060 Ti dives to £269 in Prime Day GPU deal, and it’s an MSI Trio card too

Valve now allows you to run SteamOS on your own DIY Steam Machine, with Nvidia GPU support in the works

This power connector isn’t fit for purpose, and Nvidia needs to admit it – our own GPU melting story