Samsung’s Sensitive Data Becomes a Part of ChatGPT as Employees Use It for Work Shortcuts

April 21, 2023

While ChatGPT may be a productivity enhancer, organizations need to carefully consider the situations in which it is used. Samsung has learned this lesson the hard way as employees unwittingly fed sensitive data into its training model while doing code reviews and preparing internal presentations.

A South Korean newspaper reports that some of the tech giant’s developers fed source code to ChatGPT to check it for errors and optimization tweaks, and at least one other employee made use of it to turn recordings of private internal meetings into presentations.

Fascination with ChatGPT blinds some to its risks, hidden costs

OpenAI, ChatGPT’s developer, openly states that anything put into the system (or its other free services) will be logged and used as training data, unless the user has a paid subscription. One would think the lessons of Facebook and assorted personal information brokers in recent years would have driven home the point that “free” isn’t really free and has an implicit cost in terms of sensitive data, but the novelty and potential of ChatGPT seems to have caused some to forget about this.

In terms of protecting sensitive data, OpenAI says that it does screen out personally identifying information for all users. The system’s internal workings are not very transparent, however, so it is not entirely clear how far this goes beyond flagging obvious formats (like passport numbers or street addresses). Users also do not presently have a means to see or edit what data the system has stored, but legal pressure being applied in Italy might change this in the near future.

This has led some companies to ban ChatGPT from the workplace. Samsung had actually banned it earlier this year, but lifted the ban just several weeks before this story broke. Major questions remain about how accurate and consistent these language models are capable of being, what kind of security risks they create, and where data that is fed into them might ultimately wind up. Banks have been the most active and consistent industry in terms of banning ChatGPT, due to obvious concerns about losing control of very sensitive data, but some other major companies in other industries (such as Verizon and Amazon) have also decided that there are too many unknowns at this point to take the risk.

Sensitive data used in chatbot training could appear in unpredictable places

In order to avoid having sensitive data inadvertently logged by an employee looking to automate portions of their job, organizations must either pay for a ChatGPT subscription or fill out an opt-out form (that takes some weeks to process). However, they must also consider the possibility that employees will just plug company data into the free tier of ChatGPT via a personal or unmonitored device (particularly if they are attempting to conceal the fact that they are using it). Data that is lost may not just wind up in ChatGPT’s logs, but in any other services that OpenAI runs (or may create in the future).

Samsung’s immediate response to the data breach has been to sharply limit the amount of data that each employee can upload to ChatGPT, and to send out a company-wide advisory about safe use of the platform. It is also reportedly considering building its own internal chatbot for tasks that potentially involve sensitive data.