Microsoft is warning business customers that a bug has caused critical logs to be partially lost for nearly a month, putting companies that rely on this data to detect unauthorized activity at risk.
The issue was first reported earlier this month by Business Insider, which reported that Microsoft had begun notifying customers that their log data had not been collected consistently between September 2 and September 19.
The lost logs contain security data that is often used to monitor suspicious traffic, behavior, and login attempts on a network, increasing the chance that attacks will go undetected.
A preliminary Post Incident Review (PIR) sent to customers and shared by Microsoft MVP Joao Ferreira sheds further light on the issue, saying that logging issues for some services were worse and persisted until October 3.
Microsoft’s assessment shows that the following services are affected, each with varying degrees of log disruption:
- Microsoft Entra: Possibly incomplete login logs and activity logs. Entra logs flowing through Azure Monitor to Microsoft Security products, including Microsoft Sentinel, Microsoft Purview, and Microsoft Defender for Cloud, were also affected.
- Azure Logic apps: Experienced intermittent gaps in telemetry data in Log Analytics, resource logs, and Logic Apps diagnostic settings.
- Azure Healthcare APIs: Partially incomplete diagnostic logs.
- Microsoft Sentinel: Potential gaps in security-related logs or events, impacting customers’ ability to analyze data, detect threats, or generate security alerts.
- Azure monitor: Observed gaps or reduced results when running queries against log data from affected services. In scenarios where customers have configured alerts based on this log data, this may have affected the alerts.
- Azure trusted signing: Experienced partially incomplete SignTransaction and SignHistory logs, leading to reduced signing log volume and underbilling.
- Azure Virtual Desktop: Partially incomplete in Application Insights. AVD’s core connectivity and functionality remained unchanged.
- Power Platform: Experience minor differences that affect data in various reports, including Analytics reports in the Admin and Maker portals, licensing reports, data exports to Data Lake, Application Insights, and activity tracking.
Microsoft says the log error was caused by a bug introduced while fixing another issue in the company’s log collection service.
“The initial change was intended to address a limit in the logging service, but when it was implemented it inadvertently caused a deadlock when the agent was told to change the telemetry upload endpoint in a fast-changing manner while there a dispatch was in transit to the initial endpoint This resulted in a gradual deadlock of threads in the dispatching component, preventing the agent from uploading telemetry. The deadlock only affected the dispatching mechanism within the agent, while other functionalities worked normally, including collection and recording data to the agent’s local durable cache. Restarting the agent or operating system resolves the deadlock and upon startup the agent uploads the data it has in the local cache log data collected by the agent was larger than the local agent cache limit before a restart occurred, and in these cases the agent overwrote the oldest data in the cache (circular buffer that held the most recent data, up to the maximum size ). The log data that exceeds the cache size limit cannot be recovered.”
❖Microsoft
Microsoft says that while they fixed the bug following safe deployment practices, they failed to identify the new problem and it took a few days to detect it.
In a statement to TechCrunch, Microsoft Vice President John Sheehan said the bug has now been resolved and all customers have been notified.
However, cybersecurity expert Kevin Beaumont says he knows of at least two companies with missing log data that have not received any notifications.
This incident occurred a year after Microsoft was criticized by CISA and lawmakers for not providing enough log data to detect breaches for free, instead requiring customers to pay for them.
In July 2023, Chinese hackers stole a Microsoft signing key that allowed them to breach corporate and government Microsoft Exchange and Microsoft 365 accounts and steal email.
Although Microsoft has still not determined how the key was stolen, the US government first discovered the attacks using Microsoft’s advanced logging data.
However, these advanced logging capabilities were only available to Microsoft customers who paid for Microsoft’s Purview Audit (Premium) logging feature.
As a result, Microsoft was widely criticized for not providing this additional log data for free so that organizations could quickly detect advanced attacks.
In partnership with CISA, the Office of Management and Budget (OMB), and the Office of the National Cyber Director (ONCD), Microsoft expanded its free logging capabilities to all Purview Audit standard customers in February 2024.