In this post I’m looking at reclaiming resources from virtual environments. To which you might be saying that’s simple, VMware put out a white paper on cycle harvesting. https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/vmware-jackson-national-life-insurance-white-paper.pdf The above paper was based on what a VMware customer did to reclaim idle CPU resources from their VDI environment.
They did this by setting share values on their virtual desktop at a high number and setting their harvesting VMs to a very low share number. Thus whenever there is contention for resources the virtual desktops or primary VMs win. This paper, however, doesn’t cover harvesting cycles from vGPUs. That is something that can’t be done by just adjusting share values for virtual machines.
The reason for this is, it’s not just setting a share value because vGPUs aren’t shared in the same way CPUs and RAM are. CPUs and RAM are pooled resources that are not mapped to a specific VM. (IE unless, for example, you pin a processor it’s not guaranteed that VM x will always use a given processor.) In contrast, vGPUs (and links to the underlying GPU) are mapped to a specific VM. This mapping makes it difficult to “pool” the video RAM and Streaming Multiprocessor (SM) into a single consumable resource shared by all VMs. (See vGPU constructs below) Because of these bindings we don’t have a share value for GPU resources. This is also part of the reason vGPU vMotion was difficult, you can read more about it in Sean Massey’s blog.
So how does one go about taking back (harvesting) resources with vGPUs?
It’s a multi-step process. In this blog I’ll walk through the theory of how to harvest vGPU resources and save the scripting and implementation for another time. The main parts are setting up the primary VMs (presumably VDI) and harvesting VMs, creating scripts to monitor primary VMs, and scripts to resume/suspend harvesting VMs.
To start with, a set of harvesting VMs will need to be created that will run the processes that benefit from the spare cycles. These systems are setup like other processing nodes at the VM level. In fact these VMs could be copied from the same template used for the primary environment that will benefit from spare cycles.
It is important to note the harvesting VMs vGPUs must match the vGPUs profiles used on the primary VMs (presumably VDI). In other words if the virtual desktops use a P40-4Q profile, then the harvesting VMs would also need to have P40-4Q profile. This also will require the NVIDIA Quadro Virtual Data Center Workstation (Quadro vDWS) license for your vGPU deployment.
It should also be pointed out that there are specific licensing requirements that need to be met when using a VDI environment for cycle harvesting. It is best to refer to your VMware licensing provider for details. I call this out to make sure licensing and usage compliance is adhered to. The VMware End-User-Computing Packaging and Licensing white paper says “[…] Horizon Editions include vSphere Desktop in the bundle, which is licensed in such a way that it may only be used in conjunction with exclusively desktop environments. For mixed workloads, we recommend buying an edition of vSphere, which is licensed per CPU and buying a Horizon Add-on to run on top of that.”
This should take care of the actual VMs. (Not that bad eh?) We will still need to make one modification to the VMs though. We’ll want to set share values for both the VDI and harvesting VMs. The desktop VMs will be set at a high number, lets say 1000. For the harvesting VMs we’ll set a share value of lets say 10. (The share settings for vCPU can be seen in the screen capture.)
Setting these share values may seem strange, but it’s actually an important step in the process. We want to make sure that if for some reason there is contention between the primary VMs (virtual desktops) and the harvesting VMs, the primary VMs get the resources they need and the harvesting VMs give up their cycles.
That brings us to dealing with the vGPU resources. How can we harvest excess vGPU cycles? Remember, we can’t just grab a chunk of time or space like we can with vCPU and vRAM. vGPUs have to be fully allocated to a VM.
There is nothing magic about this, VMs are built with their vGPUs attached. When those VMs are powered on they are worked into the time slicing schedule of the vGPU and are allocated video RAM. When all the time slices/video RAM are consumed no additional VMs can power up on the host. (More information about vGPU scheduling can be found here at https://1drnrd.me/GPUQoS)
You’re probably thinking, if you can only have so many VMs using the vGPU ‘slots’ at any given time how are you going to share them between the primary VMs and the harvesting VMs? The simple answer is they won’t run at the same time. You see you can suspend the harvesting VMs which keeps their state but releases the vGPU resources. Check out this blog post on it. That means the primary machines can consume those vGPUs ‘slots.’ Effectively over committing vGPU resources on a given ESXi host. (Watch the video below for more on vGPU suspend and resume technology.)
How do we take advantage of all of this? Scripting! In this blog I won’t create the scripts needed but will describe what they need to do to make cycle harvesting possible.
First we need a script to watch for empty space in the environment. One such method would be to calculate the total number of available vGPU ‘slots,’ the number of spare primary VMs, and the total number of VMs (you would also want to have a slight reserve like one or two ‘slots’ that aren’t used for harvesting or adjust the provisioning time out for spares). Then any time a slot is open, resume one of the harvesting VMs. When a spare count is decremented, suspend one of the harvesting VMs. This way there will always be d number of VMs running in the environment. (See flow chart below.)
a [Primary VMs] + b [Spare Primary VMs] + c [Active Harvesting VMs] = d [Total Running VMs] = d [Maximum vGPU Slots]
This of course should be done in a round robin (LIFO) configuration so tasks don’t become stale inside of the harvesting VMs.
This design is a very simplistic approach to dealing with the VMs. In reality there is a lot of testing that needs to occur to make sure everything starts and stops without a hitch in the script, and that VMs are on the right host at the right time. You may also not want to suspend the longest running harvest VM as it may be nearing the return of results. All of this should be taken into consideration in the set of scripts for harvesting.
With an appropriately crafted script that suspends and resumes harvesting VMs as well as using share values to insure other compute resources (CPU and RAM) are always given to primary VMs it should be possible to harvest spare vGPU cycles from your virtual environment.
This is one more reason it’s worth considering virtualizing workloads like High Performance Computing (HPC), AI, Machine Learning (ML), and Deep Learning (DL). No longer are you stuck with silos of compute resources, you can reclaim those that aren’t being used.
Later on I will create the scripts to do this and write up my validation. Till then this is just the theory of how to harvest vGPU resources. 😉