You’re in the right place if you are looking for resources from VMworld Code Session 2778 – Talk Nerdy to Me, Using Python to Create VMs with vGPUs for AI Workloads. Thank you to everyone who was able to join the session live. For those who weren’t able to attend the live session, you can catch the replay of it on VMworld.com.
Below you will find the slide deck of material I had for the session, it includes the code snipits I used during the presentation. You can also find the fully functional blocks of code up on GitHub in my VMworld21 repo, you’ll just need to update credentials/vcenter.
This is a short 30 minute session and there’s a lot to cover when working with GPUs and VMs. Hopefully this session covers the material you need to get started in automating the creation of VMs with GPUs for AI/ML/DL/HPC workloads. Because everyone’s situation is different it’s very difficult to create a script that sorta fits for everyone.
UPDATE: The VMware {code} team posted the videos up to you tube you can watch my session below.
The Code
The way I’ve set the table for this session is starting out with a connection to the vCenter (I know it seems simple). From there we capture the GPUs in a host as well as what vGPU profiles those GPUs/hosts support. Next we add a vGPU to a VM programmatically and then remove it. Once we’ve worked through those basics we bring it all together and create a VM with a vGPU.
Now lets break down some of those parts as to why they are important. First of all it’s important to know that we have GPUs in the hosts we are working with. By using the first script to detect GPUs in the host we can do a safety check on our hosts to make sure we are able to do what we want. There are also many other things we can do with that information.
Next when creating an AI VM, you need to know what profiles are available to consume in that VM. It’s better to do this programmatically than to manually assign profiles. This is important especially when you have multiple GPUs in your environment and you need the right one for the right VM. Malformed VMs won’t start and that leads to service problems and user dissatisfaction. This is why knowing the vGPU profiles are important.
Now we get into the the actual meat of this, we add a vGPU to an existing VM. This may be all some folks need to do, they may already have VMs created and just need a vGPU profile to it. It’s really the foundational process to get a vGPU into a VM weather its at creation or later on.
Next we perform some clean up of the VM. This isn’t really necessary for VMs, it’s just helpful to know how to do it. This set of code modifies the VM and removes the last vGPU in the VM. If you want to be a bit more specific about which vGPU you want to remove from the VM, you’ll need to be a bit more specific in the for loop to pick the one you want to remove.
Lastly we bring this together and create a VM programmatically. This script is just a slight modification of the pyVmomi Community Samples to create a VM. You’ll see we took a bit of the code from the add a vGPU to a VM and pasted into the create a VM code sampe. That adds the vGPU to the VM when we create it. Now we just run the code using a quick script.
All of that fills up a 30 minute session really quickly. Especially with questions.
What to do with this?
Those bits of code are all you need to start creating VMs for AI/ML workloads. You need to incorporate them into your automation processes.
I didn’t deploy an OS or install the vGPU drivers during this session, that really outside the virtual hardware side of things, its the same reason I don’t deploy an app in this session. Additionally I’ve found that AI admins get really particular how these things get setup and would get very cranky about an outsider like me saying how it should be done.
A few other things you should know. First you need to use a vGPU profile that consumes the entire GPU if you want to have multiple vGPUs on a VM. Partial vGPU profiles will not work. So in my demo with the P4 GPU I need to use the P4-8Q or P4-8C profile for it to be able to power up. The other part of that is the profiles have to be the same for all vGPUs in the VM, so if you have an P4-8C and a T4-8C those can’t be on the same VM. They all have to be the same.
Hopefully this content is helpful content. If you have questions, or want to see more drop a comment at the end of the blog.
Now for some fun stuff
I’d like to thank the VMware {code} team for all their help promoting this session. They did a fantastic job with the promo video for this. You get to see me in my cape and WonderNerd glasses.
I did another version of a video tease for this. It was decided it may be a bit risque and is almost not safe for work. If you would like to watch it send me a DM.
I hope you enjoyed VMworld!
May your servers keep running and your data center always be chilled.