News and Announcements

Spectre/Meltdown Performance Impact

Posted: 2018-04-06 13:59:59   Expiration: 2018-07-02 13:59:59

Disclaimer: This news item was originally posted on 2018-04-06 13:59:59. Its content may no longer be timely or accurate.

Executive Summary: The Spectre & Meltdown patches are a very complicated issue. There are serious security, performance, political, and budgetary issues, as well as quite a few variables that determine whether there will be a performance impact. As such, it is very difficult to generalize about the performance impact workloads will see, but there are tools and techniques available to CCI Virtualization customers to mitigate problems should they arise.

First, Spectre & Meltdown are problems with very few mitigating controls. An attacker can essentially listen to what a CPU is doing for someone else and gain access to very secret things, like encryption keys, passwords, and so on. Firewalls and antivirus software do not effectively protect against this threat. Patching does.

Second, security updates are mandated on our campus by the CISO & UW Cybersecurity groups. This is as part of the Departmental Security Baseline. The Baseline states that “All server operating systems must have critical and security patches applied within 30 calendar days of release.” With recent high-profile security issues inside the UW System this is being scrutinized at the Board of Regents level.

Third, there are performance degradations associated with the patches. The amount of degradation depends on the operating systems, types of workloads, and generation of underlying hardware. It also depends on mitigating circumstances like available CPU. A workload may simply consume more CPU to do the same amount of work, but if there isn’t extra CPU to be used then a slowdown may be seen. Other workloads, especially I/O intensive ones, will see a slowdown, but it is impossible to predict how much.

These performance impacts are felt equally for departmental workloads, CCI workloads, or workloads in public clouds like AWS and Azure. For instance, AWS and Azure are built with mixes of many different ages & types of CPUs, selected randomly when a VM is instantiated. The end user has little control over where they are placed, and if they dislike a type of CPU they must instantiate more new VMs hoping to be placed on newer hardware, all while incurring charges of at least an hour per instance. Netflix has gone to great lengths to game this process because their scale means that it saves them money. The rest of us don’t have workloads that sensitive to the performance hits and the staff time investment isn’t worth the return.

In CCI Virtualization we do have control of workload placement and have tools to assess the performance of particular workloads. As of April 2018 CCI Virtualization is a mix of 12G Dell hardware running E5-2697v2 CPUs (Ivy Bridge) and 13G Dell hardware running E5-2697A v4 CPUs (Broadwell), with a large-scale equipment refresh happening in 2H 2018. Workloads can be pinned to a particular architecture if necessary. To date we have done this for only one workload, and suggested to numerous others changes in CPU & RAM & storage that have drastically improved performance in other ways. Please reach out to us if you would like us to provide guidance for your workloads.

As of this writing 84% of CCI Virtualization customers are running operating system levels that include Spectre & Meltdown protections. Microsoft is understandably vague in indicating what performance decrease we can expect. Red Hat (link below) has used some benchmarks to help assess the impact to Enterprise Linux 7, which should hold for CentOS 7 as well. They saw a 1-8% performance decrease on the newer Broadwell CPUs. Intel has indicated a performance loss of 1-18% on “every CPU released in the last five years” which includes our oldest Ivy Bridge CPUs. Remember, though, that benchmarks rarely resemble our workloads.

CCI Virtualization has been running server firmware that mitigates the Spectre & Meltdown issues since January 2018. Final versions of that firmware, and the VMware vSphere hypervisor, were released & applied in mid-March 2018. To complete the protections for the environment the virtual machines need to be powered off, then repowered. In this process the infrastructure will upgrade their virtual hardware to protect them against malicious neighbors listening to their CPU system calls. All public clouds have done this forcibly to their customers, either in January or March 2018, causing outages for VMs.

When discussed, UW Cybersecurity & the UW-Madison CIO’s Office indicated that we should offer CCI Virtualization customers a month to power-cycle their VMs on their own schedule, and after that take no more than an additional month to remediate any remaining virtual machines on a schedule of our choosing. Hence the schedule that was published.

CCI Virtualization customers still need to manage their operating system’s updates as well, through Windows Update, yum, or apt-get as appropriate. Feel free to create a snapshot of your virtual machine before patching so that, if there are problems, you can roll back easily. CCI Virtualization also offers several ways to replicate your virtual servers to an alternate site which can serve as both a periodic snapshot and a DR/BC/COOP strategy for your important workloads.

If you have questions, comments, would like to talk to us about availability, or would like to work with us to assess any performance issues please reach out to CCI Virtualization at cci-virtualization@wisc.edu

Thank you.

Departmental IT Security Baseline

Speculative Execution Exploit Performance Impacts - Describing the performance impacts to security patches for CVE-2017-5754 CVE-2017-5753 and CVE-2017-5715

Firmware Updates and Initial Performance Data for Data Center Systems

-- CCI: Drew Denson