Timeframe
25th August 2015 - 12:00am UTC to 3:00am UTC.Scope of outage - DEV@cloud
- No new builds could be launched on DEV@cloud provided executors.
- Existing builds were able to complete.
- On-Premise Executors (OPE) and builds were not impacted.
Root Cause
A network configuration issue occurred preventing communication between build web-services.Data Loss
No builds in flight were lost, and no data was lost during this outage.Complications
The recovery from the outage also took longer than expected as we needed to increase the size of our build farm to catch-up with the builds that had stacked up during the outage. Dynamically sizing the farm is not particularly fast as we usually don't need to ramp up capacity that quickly.Proactive Steps
- catalogue and monitor the impacted internal service directly
- increase capacity scaling rate
- implement further changes identified in internal Post Outage Review