Microsoft AI Researchers Misconfigure Storage Container, Exposing Hoard of Company Information in Data Leak

September 27, 2023

Microsoft employees are undoubtedly getting a heads-up about configuring shared access to storage containers after the news of a data leak involving 38 terabytes of AI information. AI researchers at the company made a mistake in crafting a URL for sharing files from an Azure storage container, providing anyone with the link full access to their backups and workstations in addition to providing write permissions.

The good news about the data leak is that it appears to have stayed in-house; while unauthorized employees may have been able to access the stored files, there is not yet an indication that the URL was ever accessible to the general public or open internet. Wiz, a third party security firm, hit upon the issue during a scan, prompting Microsoft to make some changes to its sharing policies.

Microsoft data leak exposed keys, credentials, private messages

While Microsoft has issued assurances that the data leak did not venture outside of the company and that none of its internal services or customers were threatened, the scope of available information and the fact that the misconfigured URL has apparently been accessible since 2021 are concerning.

Adding to the general concern is the fact that Microsoft is not in one of its best periods in terms of security reputation, struggling with a recent high-profile breach of Outlook (which involved stolen signing keys) and an apparent unforced error in leaking confidential information about its Xbox gaming system to the public.

The core of this data leak is an oversight in creating SAS tokens for sharing purposes. The AI researchers had properly secured the account in all other aspects, but did not seem to realize that a GitHub URL they had crafted to share AI models with other Microsoft staff granted far more expansive permissions than it should have. This is something that is easy to do unwittingly with so-called “Account SAS” tokens, a problem that Microsoft’s security blog recently noted.

Compounding the problem is that Account SAS tokens are the simplest and most straightforward to create, likely the reason the AI researchers used them. Service SAS or User Delegation SAS tokens are a higher-security option that are also much easier to monitor, but require setting up a policy for storage access.

AI researchers exposed full workstation backups

The data leak would have been a bonanza for anyone looking to steal information on Microsoft’s machine learning projects, with two AI researchers exposing backups of their entire workstations. That included not just research, but credentials that would have enabled further movement into Microsoft’s network. Given that write permission was also available to anyone with the URL, malware injection would have also been very possible.

Microsoft’s update on the situation says that its regular scans for access issues did pick up on the URL, but it was dismissed as a false positive. Had it not been located by outside security firm Wiz, there is no telling how long it would have persisted as the token had been set to not expire.

While the incident highlights emerging security risks in the AI space to some degree, it is more about simple employee training and awareness as well as sound storage policy. It should have been relatively likely for someone accessing these shared files to notice that they had permissions well beyond what one would expect, yet nothing was done until outside security flagged the issue. Monitoring a data inventory as huge as the one present at Microsoft is certainly a challenge that sometimes does not have good answers, but employee awareness is a relatively simple fix.