Amazon explains outage that took out a large chunk of the internet

Amazon promised more clarity if this happens again after explaining the December 7th Web Services outage. CNBC reports that an automated capacity scaling feature led to unexpected behavior from internal network clients. The internal network was swamped with devices connecting to the cloud.

Amazon said that the failure prevented teams from fixing the problem. They had to use logs to find out what happened. The rescuers had to contend with a "latent issue" that prevented networking clients from backing off and giving systems a chance to recover, and they were "EXTREMELY deliberate" in restoring service to avoid breaking still-functional workloads.

The scaling that led to the problem has been temporarily disabled by the division, and won't be switched back on until there are solutions in place. Amazon said a fix for the glitch is coming in two weeks. In the event of a repeat failure, there's an extra network configuration to protect devices.

It might be easier to understand crises the next time around. A new version of the service status dashboard will be released in early 2022, and a multi-region support system will help Amazon get in touch with customers sooner. When services go dark, they may eliminate some of the mystery, which is important when victims include everything from Disney+ to Roomba vacuums.