CloudBees Status: 2012

Monday, 19 November 2012

DEV@cloud Upgraded to Fedora 17/x86_64

DEV@cloud build machines now run Fedora 17 by default. Support for 32 bit build machines has been deprecated. You can temporarily use the legacy labels if either of these cause you any problem.

Monday, 15 October 2012

Packet loss resolved

The network problems are once again going away - systems are returning to normal. We are monitoring things actively and will move/restart apps as needed. The backlog is cleared now, hopefully you didn't have any issues - and if you did - we apologise.

Packet loss update

Network issues are still intermittent - but we are working to remove impact on users apps and services as much as possible. Some applications are being restarted to migrate them to other locations, and backlogs/queues are being worked through. Thank you for your patience.

Network issues ongoing

There is packet loss in US-East region which is affecting some apps and users. This is still ongoing and still intermittent.

Network problems

There is currently network problems affecting the US-East region. We are actively monitoring and migrating apps (and fixing deployments) as we encounter them.

The network problems are still ongoing at this time but appear to be stabilising.

Thursday, 6 September 2012

Log in outage - resolved

We are experiencing intermittent problems with the CloudBees login system and are investigating the cause. This problem may be preventing some customers from being able to log in to manage their CloudBees resources. Running applications are not affected by this problem.

Update [13:03 PT]: we have identified the problem and are working on a fix. At this time, signing into CloudBees is working fine and we are monitoring for login problems while waiting for the final fix to be deployed.

Update [18:01 PT]: RESOLVED - logins have been stabilized for several hours and the permanent fix is scheduled to be released tonight.

Update [ 23:15 PT]: Permanent fix released and operating correctly.

Monday, 23 July 2012

Some analysis of DNS denial of service attack

You can read about the recent denial of service attack here.

New DNS provider

Our production DNS infrastructure has been migrated to a new provider. We are awaiting complete propagation, but services should be returning to normal for the majority of users

DDOS of DNS

A distributed denial of service attack (DDOS) occurred (and is still happening) with our DNS provider (Zerigo) - this caused many users to be unable to look up DNS names (web addresses) - this affected quite a few people. We switched to a DNS server that was not under attack, and then to a new provider. However, changes with DNS take time to propagate (it often depends on your ISP). Services should be returned to normal for most users by now. More posts to follow with details on this and mitigation.

Monday, 2 July 2012

Application Deployment Outage - Resolved

We detected a high number of errors related to application deployments. During the investigation, the API endpoint was taken offline and a fix was applied to deal with the error. Any applications that failed to deploy were restarted.

The API endpoint is now back online and all systems are operating normally.

Sunday, 1 July 2012

Applications restored

Maintenance work completed. Our apologies for any apps affected. Systems are normal now. Should you have specific problems with your apps from this point, please do issue a restart.

RUN@cloud rolling restarts

The restarts/maintenance work on RUN@cloud is taking longer than expected, some applications may notice ongoing outages until this is completed, sorry for the interruption to service.

Saturday, 30 June 2012

All systems operational

The power failure and associated infrastructure failures in Amazon US-EAST-1 impacted several CloudBees servers and core infrastructure.

We have since worked around this outage and brought replacement servers online in a different data-centre.

All systems are working correctly.

Friday, 29 June 2012

Power failure - Resolved

There has been a power outage in one of our provider's data-centres.

We are actively failing-over services to another data-centre in the same region.

Monday, 18 June 2012

shared routing service latencies resolved

We have made some changes that have resolved problems with some apps in US (that were not using SSL) receiving 502 errors (and generally being slower than usual).

Increased latencies for RUN@cloud apps on shared routing service

We are looking into slowness (increased request latencies) for RUN@cloud apps that use the shared routing service in the US. Apps that use SSL or their own routing service are not affected, nor are EU apps. We expect a resolution soon.

Friday, 15 June 2012

RUN@cloud back to normal

Normal service should be restored. We had to restart some applications (and there may be some limited application restarts to come). Many apologies.

Working through deployment problems

We are working through a list of apps that had deployment problems and restarting them, things should resolve shortly, sorry about the interruption.

RUN@cloud new deployment problems

We are working on problems with the RUN@cloud console and new deployments.

Thursday, 14 June 2012

Services back to normal

Service is restored (and website is also back), things are back to normal.

cloudbees.com (website) intermittent

We are working to restore the website - but peoples apps and databases take priority. Grandcentral.cloudbees.com is available, as is the support site. Full service back shortly.

Some servers are unavailable

A data centre has gone offline (or is severely struggling) and this is affecting some applications and services. We are working to migrate applications and restore service ASAP.

Shared routing service problems resolved

All applications should be back to normal.

Problems with shared app routing service

We are working on fixing problems that the shared routing service (for *.cloudbees.net applications) is currently having. This is causing problems when some new applications are deployed (deployments may fail temporarily, or apps be temporarily unavailable).

Those with dedicated routers (SSL service) should be unaffected.

We hope to have this resolved shortly.

Tuesday, 20 March 2012

DEV@Cloud Private Repository Mounting

DEV@Cloud Private WebDav repositories are now mounting without issue.

Thursday, 15 March 2012

Network issues for US regions resolved

All systems normal.

Current issues with US region

Some CloudBees services in US region are unavailable due to network issues - resolving shortly.

Wednesday, 14 March 2012

Grandcentral maintenance completed.

Thanks for your patience. Any issues with your account, please contact support.

Grandcentral currently down for critical maintenance

Expect it to be back very shortly, thanks for your patience.

api.cloudbees.com now available

The upgrade has been completed, and issues resolved, apologies for any temporarily failed new deployments.

api.cloudbees.com is being upgraded

There are some issues with new application deployments, and currently this service is being upgraded to address this.

Sunday, 5 February 2012

RESOLVED - DEV@cloud build launch issues

DEV@cloud builds are progressing slowly (the individual builds are normal speed, however the time taken for you to get an actual build underway is quite slow (and appears stalled).

We are looking into it and expect to have a resolution in the next few hours.

UPDATE

Build launch issues have been resolved and all builds are now launching as normal. If you recently subscribed to DEV@cloud, your Jenkins will be available shortly.