Please see http://status.cloudbees.com for status indicators and high level system status information.
For support, please visit support.cloudbees.com or email support@cloudbees.com

Wednesday, 25 November 2015

Jenkins Master - Upgrade to Java 8

Overview

CloudBees has changed the default configuration for all Jenkins masters to use Java 8 by default.

This modernizes our Java stack and provides a more easily supported environment for our Jenkins engineering team

Version

  /opt/java8/bin/java -version
  java version "1.8.0_60"
  Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
  Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

Activating Java 8


To activate Java 8, restart your Jenkins.

Regressions / Limitations


At this stage there are no known regressions with running on Java 8 - but you should log a support ticket if you experience any issues.


Deactivating Java 7

To revert to Java 7, please log a support ticket requesting the downgrade.

Wednesday, 11 November 2015

Outage for Jenkins Security Release 1.609.4.3

Timeframe

Vulnerability public @ November 6th 2015 15:00 (UTC)
Vulnerability closed @ November 6th 2015 22:00 (UTC)
Vulnerability fixed @ November 9th 04:00 (UTC)

Impact

  • CLI / OPE connectivity disabled

Root Cause

See https://www.cloudbees.com/jenkins-security-advisory-2015-11-06
The CloudBees response to the vulnerability announcement (see advisory) was to close the OPE/CLI TCP ports - and then remove CLI functionality shortly thereafter. This occurred 7 hours after the vulnerability was made public.

On November 8th, a patch was released to close the vulnerability in the Jenkins server, and we progressively rolled this patch out, and re-activated the OPE/CLI functionality on all Jenkins services.

Data Loss / Security Implications

Indications are


  1. there was no increase in traffic to the Jenkins servers we checked for breaches
  2. access to the CLI ports was closed 7 hours after the initial announcement
  3. the exploit as written doesn't work due to the network configuration of DEV@cloud
  4. the exploit is based off a commons-collections vulnerability announced early in 2015 - so there may have been unannounced vulnerabilities floating around the internet
Customers need to perform a risk assessment to determine whether they need to reissue credentials in their environment.

Followup

Our status notes are ephemeral - the overall outage notice was written and posted once the release had been completed.

Full information on the security vulnerability is available in 

https://www.cloudbees.com/jenkins-security-advisory-2015-11-06

Tuesday, 3 November 2015

DEV@cloud global restart - Java 7 update

We will be performing a Java upgrade and global restart of all Jenkins instances in DEV@cloud.

Purpose:

  • patch Java 7 to latest update
  • deploy Java 8 so it can be used on beta customers (in preparation for global rollout)
  • allow individual customers to be switched to Java 8

Window

  • 4th November 7am UTC - 9am UTC

Impact

The outage will be momentary for customers as their Jenkins restarts.

Due to how this patch to the environment is applied it is not possible for us to hold off this restart for individual customers.

Our monitoring systems will tell us if your Jenkins has not come back up cleanly, however in the event that you do experience issues, please raise a support request via the normal means.

Post Outage Review

There were a small number of Jenkins servers in our production environment running an older base operating system.  These older instances did not upgrade to our satisfaction - and so we made the decision to terminate these instances and reprovision customer Jenkins on newer and faster hardware.

While this was not ideal timing, the work was completed largely within the outage window - but not as quickly as we would like.

Improvements

We are reviewing the way we communicate outages with customers - in this case we did not have sufficient time (for operational scheduling reasons) to communicate this particular upgrade.

We are also reviewing the Jenkins behaviour of displaying a stack-trace to the user rather than something more useful.

There are also changes being made to our hosted Jenkins platform to improve the resilience and stability.

Tuesday, 20 October 2015

DEV@cloud CA Certificate Issue - 21 October 2015

DEV@cloud CA Certificate Issue - 21 October 2015

Timeframe (UTC)

October 20 2015 4am - October 21 2015 2am

Impact

  • Jenkins master access to HTTPS services using command line tools would fail due to missing Root CA certificate chain

Root Cause

A component on the Jenkins masters instance was upgraded - however due to a failure in the package system, the Root CA certificate list (that lives on-disk in a ca-certificates.crt file) was no longer available.

As this file was missing, anything that relied on its existence was no longer able to access HTTPS protected services - this was typically limited to command line tools such as curl and git.

Resolution


The Root CA certificate list was reinstalled.

Data Loss / Security Implications

There are no data-loss or security implications.

Followup

  • We are improving the robustness of our testing and change control processes to help limit and subsequently eliminate failure of this nature in our upgrade process.
  • We are amending our status monitoring to detect this fault (our monitoring jobs all connect to Git over SSH - and hence did not fail under this scenario)

Saturday, 17 October 2015

DEV@cloud Build Interruption - 17 October 2015

Timeframe

5am-7am 17th October 2015 (UTC)

Impact

  • New builds unable to launch during outage

Root Cause

Dynamic DNS entries for the CloudBees build system were not updated correctly, taking the build service offline.

Data Loss / Security Implications

There was no data loss or security impacts from this outage.

Followup

  • The impacted DNS update tools are being reviewed to determine the root cause of the faulty updates.  
  • The problem is intermittent and ongoing - however the system is being run manually (with verification) to maintain availability.

Tuesday, 25 August 2015

DEV@cloud Build Outage - 2015-Aug-25

Timeframe

25th August 2015 - 12:00am UTC to 3:00am UTC.

Scope of outage - DEV@cloud

  • No new builds could be launched on DEV@cloud provided executors.
  • Existing builds were able to complete.
  • On-Premise Executors (OPE) and builds were not impacted.

Root Cause

A network configuration issue occurred preventing communication between build web-services.

Data Loss

No builds in flight were lost, and no data was lost during this outage.

Complications

The recovery from the outage also took longer than expected as we needed to increase the size of our build farm to catch-up with the builds that had stacked up during the outage. Dynamically sizing the farm is not particularly fast as we usually don't need to ramp up capacity that quickly.

Proactive Steps

  • catalogue and monitor the impacted internal service directly
  • increase capacity scaling rate
  • implement further changes identified in internal Post Outage Review



Tuesday, 21 July 2015

Maven Repository Server (repo.cloudbees.com) Outage

The main Maven repository server which is used by DEV@cloud builds went offline earlier today. This may have resulted in hung or failed builds.

We have restarted the server which recovered after the reboot.