Creators demand tech giants fess up and pay for all that AI training data

Governments are allowing AI developers to steal content - both creative and journalistic - for fear of upsetting the tech sector and damaging investment, a UK Parliamentary committee heard this week.

Despite a tech industry figure insisting that the "original sin" of text and data mining had already occurred and that content creators and legislators should move on, a joint committee of MPs heard from publishers and a composer angered by the tech industry's unchecked exploitation of copyrighted material.

The Culture, Media and Sport Committee and Science, Innovation and Technology Committee asked composer Max Richter how he would know if "bad-faith actors" were using his material to train AI models.

"There's really nothing I can do," he told MPs. "There are a couple of music AI models, and it's perfectly easy to make them generate a piece of music that sounds uncannily like me. That wouldn't be possible unless it had hoovered up my stuff without asking me and without paying for it. That's happening on a huge scale. It's obviously happened to basically every artist whose work is on the internet."

Richter, whose work has been used in a number of major film and television scores, said the consequences for creative musicians and composers would be dire.

"You're going to get a vanilla-ization of music culture as automated material starts to edge out human creators, and you're also going to get an impoverishing of human creators," he said. "It's worth remembering that the music business in the UK is a real success story. It's £7.6 billion income last year, with over 200,000 people employed. That is a big impact. If we allow the erosion of copyright, which is really how value is created in the music sector, then we're going to be in a position where there won't be artists in the future."

Speaking earlier, former Google staffer James Smith said much of the damage from text and data mining had likely already been done.

"The original sin, if you like, has happened," said Smith, co-founder and chief executive of Human Native AI. "The question is, how do we move forward? I would like to see the government put more effort into supporting licensing as a viable alternative monetization model for the internet in the age of these new AI agents."

But representatives of publishers were not so sanguine.

Matt Rogerson, director of global public policy and platform strategy at the Financial Times, said: "We can only deal with what we see in front of us and [that is] people taking our content, using it for the training, using it in substitutional ways. So from our perspective, we'll prosecute the same argument in every country where we operate, where we see our content being stolen."

The risk, if the situation continued, was a hollowing out of creative and information industries, he said.

Rogerson said an FT-commissioned study found that 1,000 unique bots were scraping data from 3,000 publisher websites. "We don't know who those bots work with, but we know that they're working with AI companies. On average, publishers have got 15 bots that they're being targeted by each for the purpose of extracting data for AI models, and they're reselling that data to AI platforms for money."

Asked about the "unintended consequences" of creative and information industries being able to see how AI companies get and use their content and be compensated for it, Rogerson said tech companies could take lower margins, but that was something governments seemed reluctant to implement.

"The problem is we can't see who's stolen our content. We're just at this stage where these very large companies, which usually make margins of 90 percent, might have to take some smaller margin, and that's clearly going to be upsetting for their investors. But that doesn't mean they shouldn't. It's just a question of right and wrong and where we pitch this debate. Unfortunately, the government has pitched it in thinking that you can't reduce the margin of these big tech companies; otherwise, they won't build a datacenter."

Sajeeda Merali, Professional Publishers Association chief executive, said that while the AI sector is arguing that transparency over data scraping and ML training data would be commercially sensitive, its real concern is that publishers would then ask for a fair value in exchange for that data.

Meanwhile, publishers were also concerned that if they opted out of sharing data for ML training, they would be penalized in search engine results.

The debate around data used for training LLMs spiked after OpenAI's ChatGPT landed in 2022. The company is valued at around $300 billion. While Microsoft launched a $10 billion partnership with OpenAI, Google and Facebook are among other companies developing their own large language models.

Last year, Dan Conway, CEO of the UK's Publishers Association, told the House of Lords Communications and Digital Committee that large language models were infringing copyrighted content on an "absolutely massive scale," arguing that the Books3 database - which lists 120,000 pirated book titles - had been entirely ingested. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Mar 24
GNOME 48 lands with performance boosts, new fonts, better accessibility

Tweaks mean smoother operation even on low-end kit

Mar 24
Oracle Cloud says it's not true someone broke into its login servers and stole data

Despite evidence to the contrary as alleged pilfered info goes on sale

Mar 23
A closer look at Dynamo, Nvidia's 'operating system' for AI inference

GTC GPU goliath claims tech can boost throughput by 2x for Hopper, up to 30x for Blackwell

Mar 21
Microsoft ducks politico questions on Copilot bundling and lack of consent

Consumer price hikes come amid interrogation of why customers have to opt out of added AI features

Mar 21
Accenture: DOGE's Federal procurement review is hurting our sales

Share price list slides for top ten consultant to US government

Mar 21
NASA's inbox goes orbital after email mishap spams entire space industry

EXCLUSIVE MAPTIS mailing list blunder triggers reply-all chaos

Mar 21
Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content

Slop-making machine will feed unauthorized scrapers what they so richly deserve, hopefully without poisoning the internet