Apple: Since you care about yOuR pRiVaCy, we'll train our AI on made-up emails

Apple, having starved its AI models of data by respecting customer privacy, plans to improve its chatbot suggestions by using made-up emails.

The iGiant says it will soon start using synthetic data - that is, data generated by computers instead of actual humans - to improve email summaries generated by Apple Intelligence for those who have opted into Device Analytics.

This ask-for-permission approach contrasts sharply with social media giant Meta, which recently said it will resume training its AI models on the posts produced by users in Europe unless they opt out.

Apple is using an undisclosed large language model to invent email messages on various topics. As an example, the Mac daddy cites the message, "Would you like to play tennis tomorrow at 11:30AM?"

By generating variations on this message using an AI model and converting these into embeddings - a vector math representation - Apple can then use a technique called differential privacy [PDF] to compare the synthetic embeddings to embeddings derived from actual email messages from opted-in users, without revealing the contents of the genuine messages. This helps make the training data as close to the real thing as possible.

"Synthetic data are created to mimic the format and important properties of user data, but do not contain any actual user generated content," Apple explains in a post to its machine learning research site.

"When creating synthetic data, our goal is to produce synthetic sentences or emails that are similar enough in topic or style to the real thing to help improve our models for summarization, but without Apple collecting emails from the device," it says."This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy."

Synthetic data is widely used in AI training but has several disadvantages, including potential bias, incompleteness, inaccuracies, and model performance, among others.

At the same time, it's private - it's highly unlikely that a model trained on invented information will emit valid personal data in response to a prompt. One hopes the LLM training Apple's AI isn't leaking personal info it may have picked up during its own training into Cupertino's neural networks.

While Apple's approach has afforded customers a level of privacy only grudgingly granted by rivals, it has also denied the iPhone maker training data that might have made Apple Intelligence more competitive. The biz was sued last month for exaggerating its AI capabilities, and anecdotally, it appears there's room for improvement.

Apple is already using this technique to improve text generation within email messages in its beta software. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Apr 22
RIP, Google Privacy Sandbox

Chrome will keep third-party cookies, a loss for privacy but a win for web ad rivals

Apr 22
The AI factory: 12,000 years in the making - and absolutely inevitable

Sponsored content No need to imagine supercomputers thinking through petabytes to deliver answers at our prompting. It's here

Apr 22
El Reg's essential guide to deploying LLMs in production

Hands On Running GenAI models is easy. Scaling them to thousands of users, not so much

Apr 22
How to stay on Windows 10 instead of installing Linux

Can't run Windows 11? Don't want to? There are surprisingly legal options

Apr 22
Bad trip coming for AI hype as humanity tools up to fight back

Opinion I was into Adversarial Noise before they were famous