A grave oversight by Microsoft (NASDAQ:MSFT)'s AI research division resulted in the leak of 38 terabytes (TB) of sensitive data, discovered nearly three years after the initial incident.
The leak was traced back to a misconfigured Azure Blob storage bucket, according to cloud security firm Wiz.
In July 2020, Microsoft inadvertently shared the URL for the Azure storage bucket while contributing open-source artificial intelligence (AI) learning models to a public GitHub repository.
The exposed data included Microsoft employee information, secret keys and an archive of internal messages.
???? BREAKING: Wiz Research discovers a massive 38TB data leak by Microsoft AI researchers, including 30,000+ internal Teams messages.Here's what you need to know ???? pic.twitter.com/2V8u9IekGV
— Wiz (@wiz_io) September 18, 2023
Hard to monitor and avoid
"AI unlocks huge potential for tech companies. However, as data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards," Wiz CTO & co-founder Ami Luttwak told BleepingComputer.
"This emerging technology requires large sets of data to train on.
"With many development teams needing to manipulate massive amounts of data, share it with their peers or collaborate on public open-source projects, cases like Microsoft's are increasingly hard to monitor and avoid."
Security risk
Wiz researchers revealed the lapse on June 22, 2023, prompting Microsoft to block external access by revoking the shared access signature (SAS) token on June 24.
The overly permissive SAS token granted full control over the shared files.
Wiz highlighted that while SAS tokens could offer secure, delegated access when used correctly, their management within the Azure portal was challenging.
In addition to a lack of monitoring and governance, SAS tokens pose a security risk and their usage should be as limited as possible.