Apple, having starved its AI models of data by respecting customer privacy, plans to improve its chatbot suggestions by using made-up emails.
The iGiant says it will soon start using synthetic data - that is, data generated by computers instead of actual humans - to improve email summaries generated by Apple Intelligence for those who have opted into Device Analytics.
This ask-for-permission approach contrasts sharply with social media giant Meta, which recently said it will resume training its AI models on the posts produced by users in Europe unless they opt out.
Apple is using an undisclosed large language model to invent email messages on various topics. As an example, the Mac daddy cites the message, "Would you like to play tennis tomorrow at 11:30AM?"
By generating variations on this message using an AI model and converting these into embeddings - a vector math representation - Apple can then use a technique called differential privacy [PDF] to compare the synthetic embeddings to embeddings derived from actual email messages from opted-in users, without revealing the contents of the genuine messages. This helps make the training data as close to the real thing as possible.
"Synthetic data are created to mimic the format and important properties of user data, but do not contain any actual user generated content," Apple explains in a post to its machine learning research site.
"When creating synthetic data, our goal is to produce synthetic sentences or emails that are similar enough in topic or style to the real thing to help improve our models for summarization, but without Apple collecting emails from the device," it says."This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy."
Synthetic data is widely used in AI training but has several disadvantages, including potential bias, incompleteness, inaccuracies, and model performance, among others.
At the same time, it's private - it's highly unlikely that a model trained on invented information will emit valid personal data in response to a prompt. One hopes the LLM training Apple's AI isn't leaking personal info it may have picked up during its own training into Cupertino's neural networks.
While Apple's approach has afforded customers a level of privacy only grudgingly granted by rivals, it has also denied the iPhone maker training data that might have made Apple Intelligence more competitive. The biz was sued last month for exaggerating its AI capabilities, and anecdotally, it appears there's room for improvement.
Apple is already using this technique to improve text generation within email messages in its beta software. ®
Chrome will keep third-party cookies, a loss for privacy but a win for web ad rivals
Sponsored content No need to imagine supercomputers thinking through petabytes to deliver answers at our prompting. It's here
Hands On Running GenAI models is easy. Scaling them to thousands of users, not so much
On a road trip with an AI by your side
Can't run Windows 11? Don't want to? There are surprisingly legal options
Opinion I was into Adversarial Noise before they were famous
Sam says it's Son's money well spent