The main Maven repository server which is used by DEV@cloud builds went offline earlier today. This may have resulted in hung or failed builds.
We have restarted the server and are adding additional monitoring to detect this problem in the future.
Please see http://status.cloudbees.com for status indicators and high level system status information.
For support, please visit support.cloudbees.com or email support@cloudbees.com
Wednesday, 8 April 2015
Friday, 6 March 2015
AWS Maintenance Event Affecting DEV@cloud
This issue is resolved.
---
CloudBees DEV@cloud Jenkins may be intermittently affected by an ongoing maintenance event within Amazon Web Services, which provides the hosting for DEV@cloud. While Amazon has tried to limit the impact of this event, we are seeing issues with internal network routing between some instances.
We are working to move Jenkins instances off of these affected EC2 hosts. For the minority of Jenkins instances affected, this will result in a few minutes of downtime during the migration. We apologize in advance for any inconvenience. We are working with Amazon to better understand how to avoid this in the future.
---
CloudBees DEV@cloud Jenkins may be intermittently affected by an ongoing maintenance event within Amazon Web Services, which provides the hosting for DEV@cloud. While Amazon has tried to limit the impact of this event, we are seeing issues with internal network routing between some instances.
We are working to move Jenkins instances off of these affected EC2 hosts. For the minority of Jenkins instances affected, this will result in a few minutes of downtime during the migration. We apologize in advance for any inconvenience. We are working with Amazon to better understand how to avoid this in the future.
Wednesday, 4 February 2015
Sonar service outage
In order to provide a more reliable database backend for our Sonar services, we migrated the Sonar database to a new database server. However the data-migration failed for some Sonar instances.
Sonar instances impacted by this issue would be unable to connect to the database and thus unavailable during the outage.
To maintain the integrity of affected Sonar instances we have successfully re-migrated the impacted databases.
Upon restart of Sonar, some Sonar instances were in an inconsistent state having had plugin updates via the UI which were incompatible with the version of Sonar currently deployed at CloudBees. Our engineers have completed correctly upgrading impacted Sonar instances.
The Sonar service is now back online.
Sonar instances impacted by this issue would be unable to connect to the database and thus unavailable during the outage.
To maintain the integrity of affected Sonar instances we have successfully re-migrated the impacted databases.
Upon restart of Sonar, some Sonar instances were in an inconsistent state having had plugin updates via the UI which were incompatible with the version of Sonar currently deployed at CloudBees. Our engineers have completed correctly upgrading impacted Sonar instances.
The Sonar service is now back online.
Wednesday, 12 November 2014
Jenkins master outage
Earlier today we received a bunch of notices of bad hardware from Amazon - we took (as we normally do) proactive action to replace these servers that were on the bad hardware.
However, provisioning failed for various reasons - but the most prominent one was that one of the data centers we operate in wasn't able to support the SSD volumes that we were upgrading to (this is a limitation of some Amazon accounts, which we found out). We had to roll back some changes and then reprovision jenkins masters again. To provide availability - we recover servers to different data centres, restoring the jenkins data on fresh volumes and fresh servers - to do this we require provisioning to work, and in this case, due to the inavailability of SSD volumes in some cases, prolonged the recovery for a few hours in some cases. We are looking at how we can avoid these "older" data centres in future to prevent this.
However, provisioning failed for various reasons - but the most prominent one was that one of the data centers we operate in wasn't able to support the SSD volumes that we were upgrading to (this is a limitation of some Amazon accounts, which we found out). We had to roll back some changes and then reprovision jenkins masters again. To provide availability - we recover servers to different data centres, restoring the jenkins data on fresh volumes and fresh servers - to do this we require provisioning to work, and in this case, due to the inavailability of SSD volumes in some cases, prolonged the recovery for a few hours in some cases. We are looking at how we can avoid these "older" data centres in future to prevent this.
Thursday, 9 October 2014
Jenkins masters restarted to apply security patch
An important security patch was applied just after 1AM GMT today. This was done as a "soft" restart - where jobs that were running were allowed to complete. Some masters, however, had long running jobs and eventually had to be restarted - in that case - restarting the build is recommended (if there isn't an external automatic trigger).
Monday, 29 September 2014
Status of CVE-2014-6271 and AWS reboots
We have received several inquiries via our support channels about how CloudBees systems have been affected by CVE-2014-6271 (aka "shellshock") and the ongoing alert we have posted about AWS reboots.
CVE-2014-6271 status
CloudBees systems including Forge (Git/SVN), RUN@cloud (Apps/Databases) and DEV@cloud Jenkins Masters have been patched against CVE-2014-6271. DEV@Cloud Slaves are already hardened to allow arbitrary process execution via build scripts in isolated containers, but are being patched as an additional precaution.
Side note: the Forge outage on Sept 24 was a result of maintenance required to perform these security upgrades.
AWS reboot status
There is an active alert on status.cloudbees.com warning about a massive set of reboots that Amazon is performing on it's AWS systems (AWS is the primary provider of CloudBees computing resources). These reboots are not related to the shellshock alert, but may result in some small windows of service disruption. Where possible, we are rebooting servers ourselves ahead of the scheduled reboots to minimize disruption.
Wednesday, 24 September 2014
Forge Service restored
In the process of applying a critical security patch, Forge SVN, WebDav and Maven repositories were made inaccessible starting roughly 19:00 GMT. Service was mostly restored by 19:15, with all services being available by 19:50.
Subscribe to:
Posts (Atom)