CloudBees Status

Wednesday, 8 April 2015

Maven Repository Server (repo.cloudbees.com) Outage

The main Maven repository server which is used by DEV@cloud builds went offline earlier today. This may have resulted in hung or failed builds.

We have restarted the server and are adding additional monitoring to detect this problem in the future.

Friday, 6 March 2015

AWS Maintenance Event Affecting DEV@cloud

This issue is resolved.
---
CloudBees DEV@cloud Jenkins may be intermittently affected by an ongoing maintenance event within Amazon Web Services, which provides the hosting for DEV@cloud. While Amazon has tried to limit the impact of this event, we are seeing issues with internal network routing between some instances.

We are working to move Jenkins instances off of these affected EC2 hosts. For the minority of Jenkins instances affected, this will result in a few minutes of downtime during the migration. We apologize in advance for any inconvenience. We are working with Amazon to better understand how to avoid this in the future.

Wednesday, 4 February 2015

Sonar service outage

In order to provide a more reliable database backend for our Sonar services, we migrated the Sonar database to a new database server. However the data-migration failed for some Sonar instances.

Sonar instances impacted by this issue would be unable to connect to the database and thus unavailable during the outage.

To maintain the integrity of affected Sonar instances we have successfully re-migrated the impacted databases.

Upon restart of Sonar, some Sonar instances were in an inconsistent state having had plugin updates via the UI which were incompatible with the version of Sonar currently deployed at CloudBees. Our engineers have completed correctly upgrading impacted Sonar instances.

The Sonar service is now back online.

Wednesday, 12 November 2014

Jenkins master outage

Earlier today we received a bunch of notices of bad hardware from Amazon - we took (as we normally do) proactive action to replace these servers that were on the bad hardware.

However, provisioning failed for various reasons - but the most prominent one was that one of the data centers we operate in wasn't able to support the SSD volumes that we were upgrading to (this is a limitation of some Amazon accounts, which we found out). We had to roll back some changes and then reprovision jenkins masters again. To provide availability - we recover servers to different data centres, restoring the jenkins data on fresh volumes and fresh servers - to do this we require provisioning to work, and in this case, due to the inavailability of SSD volumes in some cases, prolonged the recovery for a few hours in some cases. We are looking at how we can avoid these "older" data centres in future to prevent this.

Thursday, 9 October 2014

Jenkins masters restarted to apply security patch

An important security patch was applied just after 1AM GMT today. This was done as a "soft" restart - where jobs that were running were allowed to complete. Some masters, however, had long running jobs and eventually had to be restarted - in that case - restarting the build is recommended (if there isn't an external automatic trigger).

Monday, 29 September 2014

Status of CVE-2014-6271 and AWS reboots

We have received several inquiries via our support channels about how CloudBees systems have been affected by CVE-2014-6271 (aka "shellshock") and the ongoing alert we have posted about AWS reboots.

CVE-2014-6271 status

CloudBees systems including Forge (Git/SVN), RUN@cloud (Apps/Databases) and DEV@cloud Jenkins Masters have been patched against CVE-2014-6271. DEV@Cloud Slaves are already hardened to allow arbitrary process execution via build scripts in isolated containers, but are being patched as an additional precaution.

Side note: the Forge outage on Sept 24 was a result of maintenance required to perform these security upgrades.

AWS reboot status

There is an active alert on status.cloudbees.com warning about a massive set of reboots that Amazon is performing on it's AWS systems (AWS is the primary provider of CloudBees computing resources). These reboots are not related to the shellshock alert, but may result in some small windows of service disruption. Where possible, we are rebooting servers ourselves ahead of the scheduled reboots to minimize disruption.

Wednesday, 24 September 2014

Forge Service restored

In the process of applying a critical security patch, Forge SVN, WebDav and Maven repositories were made inaccessible starting roughly 19:00 GMT. Service was mostly restored by 19:15, with all services being available by 19:50.