Recently, the XenServer Engineering Team has been working on improving the responsiveness of the control domain when it is under heavy load. Many VMs doing lots of I/O operations can prevent one from connecting to the host through ssh or make the XenCenter session disconnected with no apparent reason. All of this happened when the control plane was overloaded by the datapath plane, leaving very little CPU for such important processes as sshd or xapi. Let’s have a look at how much time it takes to repeatedly execute a simple xe vm-list command on a host with 20 VM pairs doing heavy network communication:

b2ap3_thumbnail_01.png

Most of the commands took around 2-3 seconds to complete, but some of them took as long as 30 seconds. The 2-3 seconds is slower than it should be, and 20-30 seconds is way outside of a reasonable operating window. The slow reaction time of 3 seconds and the heavy spikes of 30 seconds visible on the graph above are two separate issues affecting the responsiveness of the control commands. Let’s tackle them one by one.

To fix the 2-3 seconds slowdown, we took advantage of the Linux kernel feature called cgroups (control groups). Cgroups allows the aggregation of processes into separate hierarchies that manage their access to various resources (CPU, memory, network). In our case, we utilised the CPU resource isolation, placing all control path daemons in the cpu control group subsystem. Giving them ten times more cpu share than datapath processes guarantee they would get enough computing power to execute control operations in a timely fashion. It’s worth pointing out, that it does not slow down the datapath in times when the control plane is idle. The datapath reduces its cpu usage only when control operations need to run.

b2ap3_thumbnail_02_20151222-162855_1.png

We can see that the majority of the commands took just a fraction of a second to execute, which solves the first of our problems.

What about the commands that took 20-30 seconds to print out the list of VMs? This was caused by the way in which xapi handles the creation of threads, limiting the rate based on current load and memory usage in dom0. When the load goes too high, there is not enough xapi threads to handle the requests, which results in periodic spikes in the executions of the xe commands. However, this feature was useful when the dom0 was 32 bit and when the increased number of threads might have caused some issues to the stability of the whole system. Since dom0 is 64bit, and with the control groups enabled, we decided it is perfectly safe to get rid of xapi’s thread limiting feature.

With these changes applied, the execution times of control path commands became as one would expect them to be:

b2ap3_thumbnail_03_20151222-162856_1.png

In spite of heavy I/O load, control path processes receive all the CPU they need to get the job done, so can do it without any delay, leaving the user with a nicely responsive host regardless of the load in the guests. This makes a tremendous difference to the user-experience when interacting with the host via XenCenter, the xe CLI or over SSH.

Another real world example in which we expected significant improvements is bootstorm. In this benchmark we start more than hundred VMs and measure how much time it takes for the guests to become fully operational (time measured from starting the 1st VM to the completion of the n-th VM). Usual strategy employed is to run 25 VMs at a time. Following is the comparison of the results before and after the changes:

b2ap3_thumbnail_4495.png

Before, booting guests overloaded the control path which slowed down the boot process of latter VMs. After our improvements, the time of booting consecutive guests grows linearly with the whole benchmark completing twice as fast compared to the build without changes.

Another view on the same data – showing the time to boot a single VM:

b2ap3_thumbnail_4496.png

CPU resource isolation and xapi improvements make VMs resilient to the load generated by the simultaneously booting guests. Each of them takes the same amount of time to become ready compared to the significant increase that happened for the host without changes. That is how you would expect for the control plane to operate.

What other benefits would that improvements bring for the XenServer users? They will have no more problems with synchronizing XenCenter with the host and issuing commands to xapi. We expect now that XenDesktop users should be able to start many VMs in the pool master leaving it still responding to control path commands. It would allow them to start more VMs on the master, reducing the necessary hardware and decreasing the total cost of ownership. Cloud administrators can have increased confidence in their ability to administer the host despite unknown and unpredictable workloads in their tenants’ VMs.

TECHNICAL DETAILS

For anyone interested in playing around with the new feature, here are a couple of details of the implementation and the organisation of files in the dom0.

All the changes are contained in a single rpm called control-slice.

The control-slice itself is a systemd unit that defines a new slice to which all control-path daemons are assigned. You can find its configuration in the following file:

# cat /etc/systemd/system/control.slice 
[Unit]
Description=Control path slice
[Slice] CPUShares=10240

By modifying the CPUShares parameter one can change the cpu shares that control-path processes will get. Since the base value is 1024, assigning the shares of, for example, 2048 would mean that control-path processes would get twice the processing power than datapath processes. The default value for the control-slice is 10240, which means control-path processes get up to ten time more cpu than datapath. To apply the changes one has to reload the configuration and restart the control.slice unit:

# systemctl daemon-reload
# systemctl restart control.slice

Each daemon that belongs to the control-slice has a simple configuration file that specifies the name of the slice that it belongs to, for example for xapi we have:

# cat /etc/systemd/system/xapi.service.d/slice.conf 
[Service]
Slice=control.slice

Last but not least, systemd provides admins with a powerful utility that allows monitoring cgroups resources utilisation. These can be examined by typing the following command:

# systemd-cgtop

Above improvements are planned for the forthcoming XenServer Dundee release, and can be experienced with the Dundee beta.2 preview. Let us know if you liked it and if it made a difference to you!

Read More

Tagged with →  
Share →