Yesterday, users around the world faced widespread disruptions in accessing popular internet services. Platforms such as X (formerly Twitter), ChatGPT and even the “Downdetector” website, which is considered as a reference for reporting internet failures, were unavailable for hours. The source of this digital mess was a problem in Cloudflare’s infrastructure. Now, after fixing the problem, the CEO of this company has given a detailed explanation about why this event happened. According to him, this incident is considered to be the worst recorded disruption in the cloudflare network since 2019.
Matthew Prince, the co-founder and CEO of Cloudflare, quickly responded to the rumors in cyberspace, denying the possibility of any cyber attack. In the early hours, many users and experts suspected that a massive DDoS attack had taken down a huge part of the Internet. However, Prince explained in a blog post that the root cause of the incident lay not in external factors, but in an internal, technical error. According to the published technical report, a change in the database permissions system caused unexpected behavior in the company’s bot management system.
Cloudflare, as one of the main pillars of the Internet, is responsible for managing and securing about 20% of all web traffic. The company’s “Bot Management” tool plays a vital role in identifying and separating human traffic from crawling bots. On the day of the incident, the machine learning model responsible for updating the configuration files to detect these requests crashed. A change in the behavior of “ClickHouse” database requests caused the corresponding configuration file to be generated with a large amount of duplicate rows. This sudden volume increase exceeded the memory capacity set for the system and eventually led to the saturation of the buffers.
The result of this technical error was the failure of the main proxy system, which was responsible for processing customer traffic. In fact, Cloudflare’s security system mistakenly identified traffic from normal users as suspicious activity or bots and blocked their access. This was especially true for companies that had strict bot-blocking settings in place. Cloudflare executives stressed that there was no cyberattack or DNS issue involved, and that artificial intelligence-related technologies were not directly involved in the outage, although anti-AI crawlers were part of the affected system.
11,084,600
10,867,300
Toman

Source: CloudFlare
RCO NEWS



