The UK government's spy agency is warning corporations of the risks of feeding sensitive data into public large language models, including ChatGPT, saying they are opening themselves up for a world of potential pain unless correctly managed.
Google, Microsoft and others are currently shoe-horning LLMs - the latest craze in tech, - into their enterprise products, and Meta's LLaMa recently leaked. They are impressive but responses can be flawed, and now Government Communications Headquarters (GCHQ) wants to highlight the security angle.
Authors David C, a tech director for Platform Research and Paul J, a tech director for Data Science Research, ask: "Do loose prompts sink ships?" Yes, they conclude, in some cases.
The common worry is that an LLM may "learn" from a prompt by users and provide that information to others querying it for similar matters.
"There is some cause for concern here, but not for the reason many consider. Currently, LLMs are trained, and then the resulting model is queried. An LLM does not (as of writing) automatically add information from queries to its model for others to query. That is, including information in a query will not result in that data being incorporated into the LLM."
Examples of sensitive data - quite apt in the current climate - could include a CEO found to be asking "how best to lay off an employee" or a person asking specific health or relationship questions, the agency say. We at The Reg would be worried - on many levels - if an exec was asking an LLM about redundancies.
The pair add: "Another risk, which increases as more organizations produce LLMs, is that queries stored online may be hacked, leaked, or more likely accidentally made publicly accessible. This could include potentially user-identifiable information. A further risk is that the operator of the LLM is later acquired by an organization with a different approach to privacy than was true when data was entered by users."
GCHQ is far from the first to highlight the potential for a security foul-up. Internal Slack messages from a senior general counsel at Amazon, seen by Insider, warned staff not to share corporate information with LLMs, saying there were instances of ChatGPT responses that appear similar to Amazon's own internal data.
"This is important because your inputs may be used as training data for a further iteration of ChatGPT, and we wouldn't want its output to include or resemble our confidential information," she said, adding it already had.
Research by Cyberhaven Labs this month indicates sensitive data accounts for 11 percent of the information employees enter into ChatGPT. It analyzed ChatGPT usage for 1.6 million workers at companies that uses its data security service, and found 5.6 percent had tried it at least once at work and 11 percent had input sensitive data.
JP Morgan, Microsoft and WalMart are among other corporations to warn their employees of the potential perils.
Back at GCHQ, Messieurs David C and Paul J advise businesses to not input data they'd not like to be made public, use cloud-provided LLMs, and be very aware of the privacy policies, or use a self-hosted LLMs.
We have asked Microsoft, Google and OpenAI to comment. ®
Opinion The war in Ukraine is bad and wrong... but does blocking these contributions help Ukraine?
The hope? Reducing piles of admin for clinicians freeing them up for medical work
Staff and suppliers paid late last year, new requirements lead to contract price hike
Utility that began as a personal project found its way into billions of devices
Bot also botches some requests, but is about to be baked into cloud services anyway
Meta-made small language model can produce misinformation, toxic text
Concerns over consistent dialog boxes, pinning, default apps mulled