Artificial intelligence bots are all over the internet as they search for data to train language models. Much of that data contains content created by real people, and many are not happy about their data being used in that way. To combat this, companies are creating tools to prevent AI bots from accessing data on both their websites and their products.
Why are people concerned about AI bots?
Generative training AI requires significant amounts of data. To collect the information, several companies have AI bots scour the internet for content. Data comes in two forms: public and private. Public data is readily available to anyone on the Internet, while private data “includes things like text messages, emails, and social media posts made from private accounts,” according to The New York Times. The problem is that public data is run outleading to the creation of AI bots that want to scour the internet for the private alternative.
“As companies look to train their AI models on data protected by privacy laws, they are carefully rewriting their terms and conditions to include words like ‘artificial intelligence,’ ‘machine learning’ and ‘generative AI,’” the Times said . . Essentially, including businesses Googling and Meta have started using private user data, such as social media posts, to train their AI models. People are concerned that using private data to train generative AI could allow AI to replicate content created by humans, especially in areas such as art, music and literature. “In three, four, five years, whole segments of this creative industry may not exist anymore, because we will just be decimated,” Sasha Yanshin, a YouTube personality and co-founder of a travel recommendation site, told the Times .
Subscribe The Week
Escape your echo chamber. Discover the facts behind the news, plus analysis from multiple perspectives.
SUBSCRIBE & SAVE
Sign up for the free newsletters of the week
From our morning news briefing to a weekly Good News newsletter, get the best of the week straight to your inbox.
From our morning news briefing to a weekly Good News newsletter, get the best of the week straight to your inbox.
How are companies fighting back?
The data thirst of generative AI has presented a lucrative opportunity for companies with a large cache of private data. “Thanks to the scarcity of high-quality data and the enormous pressure and demand to build even bigger and better models, we are at a rare moment where data owners actually have some influence,” says MIT Technology Review. For example, music labels have opted to sue AI music companies Suno and Udio, claiming the two companies “used copyrighted music in their training data on an almost unimaginable scale,” allowing the AI models to generate songs who ‘imitate the music’. qualities of real human sound recordings.’”
In a bigger move, Cloudfare, a content delivery network and cloud security platform, created a tool designed to prevent AI bots from scraping text from websites. “We hear clearly that customers don’t want AI bots visiting their websites, especially ones that do so unfairly,” Cloudfare said in a statement blog post. While not a surefire solution, as more advanced bots can mimic how a real person uses a website, such a block can still limit a significant amount of bot activity.
However, several content owners are “torn between their instinct to protect their intellectual property and their eagerness to take money from those AI creators,” according to Axios. Platforms like Reddit and Stack Overflow are trying to balance the use of AI with data protection, but the bot’s “free access to web data is just the opening salvo of what will be an increasingly hot war.”
Discover more
To continue reading this article…
Create a free account
Continue reading this article and get limited access to the website every month.
Do you already have an account? Log in
Subscribe to The Week
Get unlimited website access, exclusive newsletters and much more.
Cancel or pause at any time.
Already a subscriber to De Week?
Digital and Print + Digital subscriptions include unlimited website access. Create an accountusing the same email address registered with your subscription to unlock access.