Blog | wondernerd.net

Oct 04 2021

New PowerCLI Module – Finding vGPU Profiles

By Tony in GPU, Horizon, PowerCLI
October 4, 2021

For those who’ve seen or are using my VDI by day Compute by Night PowerShell scripts you may have noticed that the vGPU profiles are entered manually in an array. Wouldn’t it be nice if those vGPU profiles could be easily captured and loaded into an array? That’s what this module does. This post is a deep dive on the PowerCLI module for finding vGPU profiles. It is now a key module in the VDI by day Compute by Night scripts and returns an object collection of vGPU profiles.

List of vGPU profiles supported by a host as shown in the VMware Managed Object Browser (MOB)

The vGPU_Profiles_In_Environment PowerShell module checks hosts, retrieves the supported vGPU profiles, and is now called in the VDI by Day script. This new module make this script more efficient, that’s because the script no longer has to iterate through all the GPU profiles, even for GPUs that aren’t in any of your hosts. Now you could comment out the array entries for cards not used. You just need to keep the array up to date.

You’re probably wondering at this point how it works and what’s at the set of commands at the core of this design. That’s what we’ll dig into first. Then I’ll go into how I used that to create this new module.

Breaking Down The Command

Getting the vGPU profiles can be done with two lines of code.

get-vmhost -state $vGPUHostState -location $vGPULocations | ForEach-Object { #iterate through the hosts
	#Do other stuff here
	echo $_.ExtensionData.Config.SharedPassthruGpuTypes 
}

Now you probably don’t want to just drop this code in and run it. There’s a pretty good chance you’re going to get some errors with it. Unless all the hosts that the get-vmhost view is retrieving have GPUs in them your script won’t like a call into .ExtensionData.Config.SharedPassthruGpuTypes. Most likely it will throw an error and kill your script. That’s part of the reason I have those two lines separated by several other lines of code in this module.

The way this code works is it takes an array of ESXi hosts and iterates through them in a ForEach loop. The resulting object variable (that represents a host) is represented as $_ . Having the host, we can then check to see if it has the .ExtensionData.Config.SharedPassthruGpuTypes properties.

If you look up .ExtensionData.Config.SharedPassthruGpuTypes in the Managed Object Browser for your vCenter (provided you have a host with an NVIDIA GPU) you will see that this returns a string array type. That array contains all the supported vGPU profiles for that GPU. It also means you can pass that array or iterate through it and perform some work on the profiles like I did in the new module. Which is where we’re heading next. I’m going to break down how the new module works for you.

Exploring the Module

You can get the module were looking from my GitHub repo. We’re going to break down the lines of code that make up the module. We’ll start at the top and work our way down through the module.

The first several lines of code define the function. It takes two optional arguments, $vGPULocations and $vGPUHostState. One tells us where we are looking for the vGPUs, by default it looks for all (*) hosts. The vGPUHostState is the state of the host in vSphere. This can be connected, disconnected, notresponding, or maintenance. You can pass only one of these states to the get-vmhost-view as of vSphere 7.0 U2. This shouldn’t be a problem. Rarely will you want to run this against anything but “connected” hosts. And even rarer still is running it against multiple states.

Function vGPUsInASystem {
	param(
	
	[Parameter(Mandatory = $false)]
	[string]
	$vGPULocations,
	
	[Parameter(Mandatory = $false)]
	[string]
	$vGPUHostState
	# Valid states (connected,disconnected,notresponding,maintenance) or comma seperated combination 
	)

# Take care of function paramaters
		if("" -eq $vGPULocations){ #if nothing is passed set the value to all clusters
			$vGPULocations = "*"
		} 
		if("" -eq $vGPUHostState){ #if nothing is passed set the value to all states
			$vGPUHostState = "connected" #,disconnected,notresponding,maintenance"
		}

We then instantiate the list of vGPUs as a collection. with some additional information you cant get directly from the ESXi hosts.

# Create a list of GPU Specs
		[System.Collections.ArrayList]$vGPUlist = @()
			#Name, vGPU per GPU, vGPU per Board, physical GPUs per board
			#Removed examples...
			#Null
			$obj = [pscustomobject]@{CardType="empty";vGPUname="default";vGPUperGPU=0;vGPUperBoard=0; pGPUperBoard=0}; $vGPUlist.add($obj)|out-null #catch any non-defined cards and force them out as zeros
		#help from www.idmworks.com/what-is-the-most-efficient-way-to-create-a-collection-of-objects-in-powershell/

You’ll notice most of this is comments with examples of how the entries are formatted. This ties back to how we did the collection prior to this script. We follow the same configuration, which is nice because it makes it accessible to the other scripts to use without major modification. You will note that we create a single NULL or empty object in the collection. This serves as a catch for when a host without any GPUs is passed.

This is where we get into the meat of this function. The block of code below gets a host view and starts processing it. We start with a “try” to catch any errors, remember when I said if you ran just the 3 lines of code it takes to get the profiles it may fail. Hence we should catch errors even though we test for failures. Then we create our host view and loop through the objects in a for loop all in one line.

We then take the $CurrGPU variable and set it to ’empty’ before we start iterating through the host. This way we can do some garbage collection within the loop. We then take the current host ($_) and create a new view with the Get-VMHostPciDevice, where wwe are looking for a Display controller with the NVIDIA name using a wild card at the end to capture any such cards. This view is the run through a forEach loop. (Because a host may have more than one GPU.)

This will then assign the GPU type to the CurrGPU value. This process is using an assumption on my part. I’m assuming that these hosts are following manufacturers guidelines of one type of GPU per host. (Yes, I know people who have put multiple GPUs into a single host and it doesn’t “break”.) If you wanted you could turn CurrGPU into an array and get all the card types per host and do a bit of extra processing.

Once we have that we do some garbage collection and check the existing GPU collection for any identical cards in a fun little where / select clause.

Try {
		#get-vmhost -state $vGPUHostState -location $vGPULocations | ExtensionData.Config.SharedPassthruGpuTypes | ForEach-Object {
		get-vmhost -state $vGPUHostState -location $vGPULocations | ForEach-Object { #itterate trhough the hosts
			#echo '------------------------------------------------------------'
			#echo "Host: " $_.name
							
			$CurrGPU = 'empty' #Set to empty so it catches the garbage collection if the host has no GPUs
			$_ | Get-VMHostPciDevice -deviceClass DisplayController -Name "NVIDIA Corporation NVIDIATesla*" | ForEach-Object {
				$CurrGPU = ($_.Name -split " ")[3] #only get the last part of the GPU name ie P4 
			} #this wil only get the last item in the list
			
			#Echo 'Looking for GPU: ' $CurrGPU
			#check if GPU is already in the list, if so skip it
			$GPUalreadyHere = $null #Set things to Null to make sure it's caught in the check below
			$GPUalreadyHere = $vGPUlist.CardType | where { $_.ToLower() -eq $CurrGPU.ToLower() } | Select -First 1;  #Find if the GPU is already in the array

In the next block of code we check and make sure the GPU hasn’t been added to the array. If it has there’s no point in adding it again. Assuming it’s a new GPU we create a variable that deals with the largest host size and make it 0 to start with. We then go to the host ($_) and get the ExtensionData.Config.SharedPasthruGpuTypes and iterate through them with a ForEach_object loop.

			if ($GPUalreadyHere -eq $null){  #The GPU is not in the array
				
				#Added 8-3-21 as the profile size changed in a previous vSphere release
				$LargestProfileSize4Host = 0 #Set to 0 for garbage collection
				$_.ExtensionData.Config.SharedPassthruGpuTypes | ForEach-object { #itterate through the cards supported configs and find largest size

Inside of this ForEach loop we do the work of creating the object entry for the object collection of GPUs. We start by removing the “grid” entry from the returned array entry as “grid” is understood at this point and not needed.

					$CurrProfile = ($_ -split "-")[1] #Get just the profile  (ex: 8q)
					#echo $CurrCard " : " $CurrProfile

The next set of code is handling some special cases that are tied more to using this for VDI than for AI workloads, because they focus on older profiles. The first two if statements check for the 2b4 profile and the 1b4 profile. These profiles are on some older cards. They will eventually be aged off as the cards reach end of life and will no longer be a concern with the code. For now we keep them in and equate them to their 2b and 1b counterparts. We also capture the profile number here for use later in the code.

					#Safety Check for 2b4 and 1b4 profiles which should be removed eventually
					if ($CurrProfile -eq "2b4") {
						$CurrProfile = "2b"
					}
					if ($CurrProfile -eq "1b4") {
						$CurrProfile = "1b"
					}
					#echo "==============="
					#echo $CurrProfile
					$ProfileNum = $CurrProfile -replace "[^0-9]" , '' #get just the profile number
					#echo $ProfileNum

Next we perform a conditional check to make sure we are dealing with a profile larger than 0 and set it accordingly. We do this so it doesn’t matter which order the cards are listed in the vSphere array that was returned and we are currently iterating through. We then exit out of the if the GPU and thus its profiles are in the array conditional check.

					if ($ProfileNum -gt $LargestProfileSize4Host) {
						$LargestProfileSize4Host = $ProfileNum #find the largest profile size and set it
						#echo "Largest Profile Size set to: " $LargestProfileSize4Host
					}
				}

We follow this block by looping through the vGPU profiles of the given host ($_) using the ExtensionData.Config.SharedPassthruGpuTypes again. This second loop through the vGPU profiles is what adds them to the collection. It starts by getting the card type and profiles and assigning them to the CurrCard and the CurrProfile variables using split. We then take care of the 2b4 and 1b4 profiles again for this loop. Then, we finish this block by retrieving the profile number (the 8 in 8q) and assigning it to the ProfileNum variable.

 				$_.ExtensionData.Config.SharedPassthruGpuTypes | ForEach-object { #itterate through the cards supported configs
					#echo "========================================================"
					#echo "vGPU Profile: " $_ #(ex: grid_p4-8q)
					
					$TempCard = ($_ -split "_")[1] #remove the grid-card entry from the profile name 
					$CurrCard = ( $TempCard -split "-")[0] #get the card type of the profile string (ex: p4)
					
					$CurrProfile = ($_ -split "-")[1] #Get just the profile  (ex: 8q)
					#echo $CurrCard " : " $CurrProfile
					
					#Safety Check for 2b4 and 1b4 profiles which should be removed eventually
					if ($CurrProfile -eq "2b4") {
						$CurrProfile = "2b"
					}
					if ($CurrProfile -eq "1b4") {
						$CurrProfile = "1b"
					}
					#echo "==============="
					#echo $CurrProfile
					$ProfileNum = $CurrProfile -replace "[^0-9]" , '' #get just the profile number
					#echo $ProfileNum

Now its necessary to deal with a couple of M series cards. Specifically the M60 and M10. These cards have multiple GPU chips on them which means they support 2 or more profiles each so we need to account for that by setting the GPUsOnBoard and setting the ProfileNum variable correctly for each card. In the near future this will also need to be done for the A16 GPU which also has 4 GPU chips.

					#########################################################################
					#Deal with M series cards even though not likely to use this
					#########################################################################
					$GPUsOnBoard = 1 #Everything but M series cards have a single GPU on the board
					if ($CurrCard.ToLower() -eq 'm60' -or $CurrCard.ToLower() -eq 'm6'){ #These two cards have 2 GPUs on the board
						$GPUsOnBoard = 2 #Set the number of GPUs on the board
						if ($ProfileNum -eq 0){ 
							$ProfileNum = 0.5 #Only these cards of 0 profiles which is technichnically 0.5 for math
						}
					}
					if ($CurrCard.ToLower() -eq 'm10'){ #This host has 4 GPUs on the board
						$GPUsOnBoard = 4 #Set the number of GPUs on the board
						if ($ProfileNum -eq 0){ 
							$ProfileNum = 0.5 #Only these cards of 0 profiles which is technichnically 0.5 for math
						}
					}

Now we start looking at profile sizes and how they relate to our vGPUs. It will take the ProfileNum that we got in previous lines of code, and compare it to the largest GPU profile size. This starts at 0 and keeps growing. We then take that largestProfileSize4Host and check to see if it’s greater than 0. (We dont want any division by zero issues.) We then divide the LargestProfileSize4Host by the ProfileNum to find out how may vGPUs per GPU chip are supported which is the vGPUperGPU variable. Lastly we multiply vGPUperGPU times the number of GPUsOnBoard to get the total number of profiles supported for the board.

					#########################################################################
					#Assumes that top most profile is the largest profile returned, so a P4-8q is largest
					#Assumes we are not mixxing card types in the same hosts
					#########################################################################
					#echo $ProfileNum
					
					if ($ProfileNum -gt $LargestProfileSize4Host) { #if the profile is larger than 0 set it as the max GPU size
						$LargestProfileSize4Host = $ProfileNum
					}
					
					#Saftey check to avoid division by 0
					if ($LargestProfileSize4Host -gt 0){
						$vGPUperGPU = $LargestProfileSize4Host / $ProfileNum #(ex: 1 vGPU per board)
						#echo "Max vGPU per GPU: "  $vGPUperGPU
					}
					
					$vGPUsPerBoard = $vGPUperGPU * $GPUsOnBoard #Set the number of vGPUs per board based on GPUs per board times number of vGPUs per GPU chip

Once we have all that information we can add the vGPU to the object list. All the previous code worked out our profile counts, GPUs per board, and much more. Amazing what you can get from a little list isn’t it. We have several commented out echos to validate our results. We then add the vGPU profile to the array. Exit our inner for loop. Exit our conditional statement insuring the GPU hasn’t been checked yet (the echo $vGPUlist line). Then exit the for loop for the hosts ($_). At this point we can return the vGPUlist to the calling program. At which point we conclude the try statement

					#echo '++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++'
					#echo 'Add entry to array'
					#echo $_
					#$obj = [pscustomobject]@{CardType="m10";vGPUname="grid_m10-8q";vGPUperGPU=1;vGPUperBoard=4; pGPUperBoard=4}; $vGPUlist.add($obj)|out-null
					$obj = [pscustomobject]@{CardType=$CurrCard;vGPUname=$_; vGPUperGPU=$vGPUperGPU; vGPUperBoard=$vGPUsPerBoard; pGPUperBoard=$GPUsOnBoard}; $vGPUlist.add($obj)|out-null
					#echo '++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++'
				}					
			}
			#echo $vGPUlist
			#echo "Just about to return"
			
		}
		#echo $vGPUlist
		#echo "this is what we want"
		return $vGPUlist
	}

The last part of this function is a catch statement. If anything went wrong that we didn’t test for and handle we through a catch statement. We use the write-host to make sure the error is seen as this should be presumed to be a fatal event. We also return a -1 for testing which is an invalid value. We also put an example of how to call it at the bottom.

Catch {
		write-Host "Error creating entries for vGPUs"
		#echo "failed to create entries for vGPUs"
		return -1 #return an invalid value so user can test
		Break #stop working
	}
	
}

#echo "pre call"
#Example: vGPUsInASystem "*" "connected"

vGPUsInASystem "*" "connected" #, maintenance"

That pretty well covers the way this module works. If you have questions let me know. If you want to improve the code please be sure to fire off a new branch of code.

May your servers keep running and your data center always be chilled.

Code, EUC, GPU, Horizon, NVIDIA, NVIDIA GRID, PowerCLI, PowerShell, VDI, VDI by day, VDI by Day Compute by Night, vGPU, VMware, VMware Code, VMworld

Permanent link to this article: https://www.wondernerd.net/powershell-module-finding-vgpu-profiles/

Aug 27 2021

Talk Nerdy to me & Updates at VMworld

By Tony in AI, Conferences, GPU, Horizon, Machine Learning, NGCA, PowerCLI, Python, vExpert, VMworld
August 27, 2021

It’s that time of year again. It’s VMworld time! Hopefully you’ve taken advantage of the fact that it’s free!!! Yup, it costs your nothing, zero, zip, zilch to attend. So, if you haven’t registered, get over to VMworld.com and get registered. If you don’t register I won’t talk nerdy to you!

This year VMworld is a virtual event again and that’s fine. I want everyone to be safe. Until our friends can join us that probably the best way to do a conference, and I want my friends from all over the world to be able to join in person. So if it’s another year of sitting at my desk in zooms till the wee hours of the morning, I’m ok with that and hopefully you’ll join us.

Aside from getting together with friends there are many things I really enjoy about VMworld. Of course there are the labs and all the new announcements. There is also just the sheer volume of knowledge that is being shared, and I know what you’re thinking… “You’ll go back and catch the videos later.” After all it all per-recorded content.

Let’s talk about both going back to watch later and content being per-recorded. We both know that you’re not going to find the time to go back and watch conference content later. I still have conference content from VMworld 2013 that I haven’t caught up on yet. You probably won’t go back and catch up on them either. So what I do, is block that time off in my calendar, set my IM status to do not disturb, and plan to avoid answering calls or emails. I explain to folks this is my (and their) opportunity to keep up to date and relevant in the field of virtualization.

Now let’s talk about per-recorded sessions. It’s true that some sessions are being per-recorded this year, like vBrownbag Tech Talks, others aren’t. I’ve only seen a few tech talks where there has been time for questions and where anyone has questions to ask. Most of the time you watch the session and then catch the presenter when they walk off stage. Having sessions live versus recorded is really minor to me. I plan on watching even if I know they are per-recorded. Won’t you join me?

In fact a few weeks ago I per-recorded session VMTN2835, it’s a vBrownbag session where I talk about new updates to my VDI by day compute by night script. (I’ll link to my update blog once VMworld is under way.) If you’re interested in working with vGPUs in PowerCLI it’s definitely something to catch.

There are several advancements that have been made that I talk about in the session. For example I talk about changes in 7.0 U2 and their impact to the scripts as well as how those have been addressed in these latest set of scripts. If your organization is even sort of interested in getting more use out of your VDI environment, you probably want to catch this session. Besides you already know where to find me afterwords to ask questions.

I’m guessing those updates aren’t the VMworld updates you were expecting… I wouldn’t want to spill the beans on anything cool in the works for VMworld. Regardless, there are going to be some really good updates coming out of VMworld.

Along those lines, something you may want to start brushing up on is AI/ML in virtual environments. Yeah you can run AI/ML workloads as VMs, it works, I’ve been evangelizing it since 2017… As a mater of fact NVIDIA had a big announcement about NVIDIA AI Enterprise platform for running AI in a virtual environment a few days ago. (BTW, just in case you missed it and in the interest of full disclosure, I work for Dell.)

If you’re really into running AI workloads in a virtual environment (and who isn’t?) Join me for my live VMworld {Code} session, CODE2778, where I’ll talk nerdy about using python to automate the creation of AI/ML VMs with vGPUs. We’re going to construct python code that will allow admins to create VMs with vGPUs. One of the things that is common with AI VMs is the need to have multiple vGPUs in a single VM. Not only will we delve into how to add multiple vGPUs, we’re going to talk about the other things you need to be aware of for multiple vGPUs to work. Sounds like a VMworld session you don’t want to miss.

That said, if you code, and I’m sure you do, you don’t want to miss VMware {code} at VMworld. Which is an entire program track geared towards programmers. Last years VMware {code} conference was stellar and from talking with the {code} team, this year is going to be just as good. If you want to see all of the {Code} sessions in the VMworld Content Catalog expand the “session type” option and then select “VMware {code} .

You maybe curious what are my picks for sessions to catch at VMworld. You probably guessed VMware {code} session and here are some of my other picks that I don’t want to miss.

Just in case you can’t guess, there’s a lot of stuff happening around AI at VMworld this year. My list is evidence of that. Searching the catalog there are, at the time of this writing, 23 sessions that have NVIDIA in their abstract or title. I think this is the most number of sessions I’ve seen around NVIDIA and GPUs.

Hopefully all of this is a pretty good reason to attend VMworld this year. It looks like there are going to be a lot of top quality sessions and a chance to learn a lot of new information. Hopefully you will see me in some of the social areas and are able to say hi. I look forward to seeing you there. Now go register if you haven’t!

May your servers keep running and your data center always be chilled.

August 27, 2021

CODE2778, Conference, GPU, NVIDIA, NVIDIA AI Enterprise, sessions, vBrownBag, vBrownbag Tech Talk, VDI, vGPU, VMTN2835, VMware, VMware {Code}, VMworld

Permanent link to this article: https://www.wondernerd.net/talk-nerdy-to-me-updates-at-vmworld/

Apr 13

Harnessing the Power of Python to Control vGPU Management in VMware vSphere – GTC 2021 Session E32023

By Tony in AI, Deep Learning, GPU, GTC, HPC, Machine Learning, NGCA, Python, VMware
April 13, 2021

In this blog I provide some insights on managing vGPUs in vSphere with Python. It sounds like a simple task to control vGPUs in vSphere, but its not as easy as it appears. I got deep into this as Johan and I were working on content for this session. I’ll cover all that and more in this post.

Just Here for the Slides

If you are here for the session slides, you’re in luck. Below you’ll find a copy of the slide deck. This is the same deck you’ll get through the GTC site. For the video of the session you’ll need to log in to the GTC web site. If you have questions about the session be sure to use the Contact Me page or leave a message in the comments.

https://www.wondernerd.net/wp-content/uploads/2021/04/E32023Tony-Foster-_-Johan-Van-AmersfoortHarnessing-the-Power-of-Python-to-Control-NVIDIA-vGPU-Management-in-VMware-vSphereFINAL.pdf

For those just getting started with vGPUs or wondering what they are. They are Virtual GPUs. What that means is VMware, in this case, is abstracting the physical GPU so that multiple VMs can use the physical hardware at the same time. That means you can have multiple VMs all running graphics applications or maybe even running an AI workload, because the GPU has been “sliced up.” You can also add up to 4 vGPUs to a single VM, making the VM a vGPU power house. It’s really fun stuff to work with in my opinion.

Pythoning it up

That gets to the Python side of things. We aren’t talking about how you can program vGPUs as part of a CUDA or PyTorch operation, for the most part that is done the same as if it were a physical host. I actually delved into this in a GTC 2018 session, S8483 – Empowering CUDA Developers with Virtual Desktops. Instead the real focus of this session is to look at how you manage the back end virtual hardware through a programmatic method. In other words it’s great to manually add a vGPU to a VM through the vCenter, but that isn’t all that helpful when you want to automate the process.

After all, most IT admins would rather write some code to do a common repeatable task than be that monkey trained to do the same thing over and over again. You can do all the stuff we talked about in this session through PowerShell and PowerCLI, and I’ve done several blogs on how to automate that, and that’s all well and good, unless you are looking for a different type of control, Python.

Python, is a modern programming language that’s widely used and can easily be ported from one OS to another. Plus compared to other languages it doesn’t have nearly as steep of a learning curve. There are also a lot of great plugins and modules for Python including PyVmomi. All of these are reasons why many programmers who are developing software around vSphere are using Python.

Earlier I mentioned that programming may not be as easy as it first appears. Here’s why, GPUs and vGPUs don’t have a lot of documentation about them as programming objects. If you do a google search for adding a vGPU to a VM with Python you’re not going to get a lot of results. (My Google search returned 0 meaningful results in the first two pages.) Why because people haven’t needed to do this all that much… until now.

AI/ML/DL/HPC run it virtually

Why is it needed now? Much of it has to do with the raise of AI/ML/DL/HPC on virtual platforms. (I discussed this in a 2018 VMworld Session, VAP2340BU – Driving Organizational Value by Virtualizing AI/ML/DL and HPC Workloads.) And some of it with VDI. Up until late 2020 many thought it was absurd to put these “special workloads” on a virtual environment, because “they require bare metal performance.” Only a handful of folks understood what power virtualizing them held for the enterprise. No longer is AI the stuff of science experiments, at has a real business value and needs to be treated like any other business IT stack. And what do businesses do with IT stacks? They automate.

So how do we crack open the vault and start programmatically controlling our vGPU infrastructure. Where I started was with the VMware vHPC toolkit on GitHub. It has a lot of good samples and examples. The best thing to do is to do a ctrl+f and search for “GPU.” The content is fantastic. But the code in there is really chunked up so you have to bounce all over to figure out which objects you are working with. This is great once you understand the infrastructure and what you’re trying to code against, not so much when you are learning it.

The Secret Decoder Ring

That’s where the vSphere MOB comes into play. Don’t worry there’s no offer to refuse. MOB stands for Managed Object Browser. You can get to it by appending /mob to the end of your vSphere URL. The MOB allows you to browse all the objects managed by vSphere. So not just GPUs. The problem is finding them. For the GTC session we put together a little MOB Cheat Sheet slide to help you figure out where things are in your environment and the type of objects they are. This will help you as you code figure out what object you need to access and when.

In the session we talk about two different ways of calling objects. You can navigate to them directly or access them through the use of a container view. Calling them with a container view looks like this:

TempVMlist = HostContent.viewManager.CreateContainerView(HostContent.rootFolder,[vim.VirtualMachine], True)

The container view takes a content type of vim.VirtualMachine in this case, the MOB provides these object types so you can easily specify what sort of object you are looking for. You can navigate to object directly as well.

DataCenterContent = HostContent.rootFolder.childEntity[0] #Assume single DC       
VMs = DataCenterContent.vmFolder.childEntity

These two lines of code navigate directly to the VMs in a vSphere environment instead of creating a container view. This path can be found using the MOB as well. Having a context of both of these helps to make it easier to understand what you are doing.

The main areas for objects are:

managed_object_ref.config.sharedPassthruGpuTypes #Shared Passthrough GPUs
ChildVM.config.hardware.device #VM child hardware device listing
isinstance(VMVirtDevice, vim.VirtualPCIPassthrough) #Has a virtual PCI passthrough device
hasattr(VMVirtDevice.backing, “vgpu”) #Has a backing attribute of vgpu
VMVirtDevice.backing.vgpu #Device Backing
VMVirtDevice.deviceInfo.label #Device label eg. grid_p4-8q
VMVirtDevice.deviceInfo.summary #Device summary

All of the code we discussed in the session is available on GitHub so you can try it out for yourself. It’s important to note there is very little error checking in these scripts, and that is intentional because we don’t know how you intend to use them. So if you intend to use them in production be sure to add the appropriate error handling.

In the repository we provided details on how you can find what host have GPUs, which VMs have vGPUs, how to add a vGPU, to a VM and remove it. You probably want to know, as I did, how do you get to the stats and all the details you get with the nvidia-smi command.

Digging Deeper With NVML

That’s something I was hoping to share in the session. NVIDIA provides a tool called the NVIDIA Management Library or NVML which can provide an insane amount of information about your GPUs and vGPUs. The only problem is VMware doesn’t allow this to be exposed through the vSphere API. That means the only way to get to the information provided by the NVML is through a terminal session (SSH).

It took several emails back and forth to make sure I wasn’t missing anything and that the only way to get the NVML goodness was through SSH. And I can confirm as of this writing that this is the case. And unfortunately that programming is a bit beyond the scope of both the GTC session and this blog.

Hopefully this has helped unlock the secrets needed to programmatically mange you vGPUs with Python.

May your servers keep running and your data center always be chilled.

GPU, GTC, GTC21, NGCA, NVIDIA, NVIDIA GPU, NVIDIA vGPU, NVML, programming, Python, Python Programming, PyVmomi, vGPU, VMware, VMware vGPU

Permanent link to this article: https://www.wondernerd.net/harnessing-the-power-of-python-to-control-vgpu-management-in-vmware-vsphere-gtc-2021-session-e32023/

Oct 05

Sneak Peak at the Brand New Jetson Nano 2GB

By Tony in AI, CUDA, Deep Learning, GTC, NGCA, NVIDIA Jetson
October 5, 2020

How would you like a sneak peak at the brand new NVIDIA Jetson Nano 2GB Developer Kit? I have the hook up for you. NVIDIA sent one out to me as part of their NGCA social media program prior to today’s (October 5th) launch.

Let’s get right to the details…

We’ll start with the price… you can order a Nano 2GB Developer Kit for $59. This is designed for developers, STEM opportunities, and IOT applications. All of these are price sensitive areas that also need GPU power for AI based opportunities.

So what do you get for $59? You get a lot. This has many of the same features that its big brother the Jetson Nano has.

There are 4 big differences between the the Nano and the Nano 2GB. They are:

The Nano 2GB ships with 2GB of RAM, hence the name
It comes with a wireless adapter! (More on this later)
There are fewer ports on the Nano 2GB
It has a USB-C connector to provide a larger power draw

There are also so minor changes, we’ll dig into both the major enhancements and some of the minor changes with the Jetson Nanon 2GB.

Let’s start with the memory in the Nano 2GB. This isn’t as much as it’s older brother. In my opinion this is actually is better, especially for STEM and IOT applications. It’s the ideal size for getting started with AI where you don’t have large data sets and are probably using a pre-trained model. All of this really reduces the need for the extra memory and keeps the price reasonable.

Speaking of reasonable… The Nano 2GB comes with a wireless adapter, which is awesome. For those who do development or are doing STEM really need wireless capabilities. In some IOT instances wireless is also a requirement. In my mind this is a god send.

Now it should be noted that in some regions the Nano 2GB won’t come with a wireless adapter. For those regions the price is reduced to reflect the difference of the wireless adapter. Talking with the program team they are working on an adapter that will work for these other regions that don’t currently have one.

The Nano 2GB Developer Kit, does not have a display port. And for this I am thrilled!!! I no longer need to worry about youth who are using them for STEM projects trying to force HDMI cables into the display port. This is one of my favorite physical enhancements to the Nano 2GB.

Additionally there is only a single USB3 port on the Nano 2GB. This makes good sense to me. With my development and STEM work (and I assume for others) I very rarely connect a single device that needs USB3 speeds, much less 3 or 4. Wifi, keyboard, mouse, and maybe external storage are my typical USB connections.

The last exciting physical aspect that stands out for me is, with the Nano 2GB you don’t need a special barrel style power supply when you need full power. The Nano 2GB uses a USB-C connector for power. For me this is an ideal enhancement to the Developer Kit.

With a USB-C connector it makes it so much easier to find power to run the Jetson Nano 2GB, especially when working with youth on STEM projects. Think about it… if you have an Android cell phone, what is the power connector? If your phone is newer than 2 years old its probably a USB-C charger… That means you’ve got a power supply for the Nano 2GB.

There are also some other physical attributes that are worth mentioning with the Nano 2GB. According to the documentation, in some regions there are not a set of pins for a cooling fan. The Nano 2GB I received did not have pins for a fan. I don’t expect I will get the Nano 2GB that warm and I probably will not put it in a case.

NVIDIA has also moved the control pins, like power, reset, etc. to under the processor on the developer kit board. They are in a single row which. You can see this in the photo along with what each pin is for.

The GPIO pins are identical to the Jetson Nano. You can see the pin out in the photo.

You can find details on the Jetson Nano 2GB on the Nano 2GB product page.

It’s time to wrap up this post. Today NVIDIA announced the Jetson Nano 2GB Developer Kit. This is awesome for developers, STEM learners, and IOT applications. It has amazing power in a small package with an equally small price tag. It only costs $59, and can be pre-ordered today from the NVIDIA store.

May your servers keep running and your data center always be chilled.

AI, GPIO, GPU, Jetson, Jetson Nano, Jetson Nano 2GB, Nano, Nano 2GB, NVIDIA, NVIDIA Jetson, NVIDIA Jetson Nano, NVIDIA Jetson Nano 2GB, System on a Chip

Permanent link to this article: https://www.wondernerd.net/sneak-peak-at-the-brand-new-jetson-nano-2gb/

Oct 02

Trove of VMware Project Monterey Resources

By Tony in AI, Conferences, Deep Learning, GPU, GTC, HPC, Machine Learning, VMware, VMworld
October 2, 2020

You may have heard that on Tuesday at VMworld 2020, VMware announced a partnership with NVIDIA called project Monterey. One of the central parts of this is using a SmartNIC’s (NVIDIA Mellanox BlueField-2 DPU) for services and boot images. This post details a lot of the materials that were released as part of the VMworld announcement as well as some other blogs from VMware on it.

Here is a great picture from Kit Colbert’s blog post providing an overview of the components of a SmartNIC which is a central core to Monterey.

https://blogs.vmware.com/vsphere/files/2020/09/Project-Monterey-What-is-a-smartNIC.png — https://blogs.vmware.com/vsphere/2020/09/announcing-project-monterey-redefining-hybrid-cloud-architecture.html

Most VMworld content can be found at VMworld.com and is on demand. Some content though was live and so may not be available yet from the site.

Monterey was first introduced as part of the VMworld general session, you can watch it on demand as session GEN2859. You can start to get an idea of what project Monterey is all about.

To dig deeper into the announcement be sure to watch session VI3178 with Pat Gelsinger (VMware) and Jensen Huang (NVIDIA). This gives a good overview of it for those not deep into the technology.

Now lets cover the press releases on project Monterey:

The partnership press release: https://www.vmware.com/company/news/releases/vmw-newsfeed.VMware-and-NVIDIA-to-Enable-Next-Gen-Hybrid-Cloud-Architecture-and-Bring-AI-to-Every-Enterprise.801b9b92-97a9-4a75-9074-116733900cb5.html

Project Monterey press release: https://www.vmware.com/company/news/releases/vmw-newsfeed.VMware-Unveils-Project-Monterey-Re-Imagining-Hybrid-Cloud-Architecture-to-Support-Next-Generation-Applications.89238c82-cabf-4136-b66e-642617f10d40.html

NVIDIA’s partnership announcement: https://blogs.nvidia.com/blog/2020/09/29/vmware-gelsinger-nvidia-huang/?ncid=so-twit-95535#cid=_so-twit_en-us

There are several good blogs that relate to project Monterey that should be part of your review:

NVIDIA Blog on what is a DPU: https://blogs.nvidia.com/blog/2020/05/20/whats-a-dpu-data-processing-unit/ <– READ THIS IT’S IMPORTANT The DPU is a at the heart of project Monterey, it’s important to understand what a DPU and SmartNIC is.

Kit Colbert’s blog post announcing project Monterey: https://blogs.vmware.com/vsphere/2020/09/announcing-project-monterey-redefining-hybrid-cloud-architecture.html

Note that the breakout session (HCI3351) mentioned in Kit’s blog is not yet available in the content catalog. The same is true for the round tables HCP3047S and OCTO3150S. These 3 sessions are really good at getting deeper into the technology of Monterey.

OCTO3150S is a really good session outlining much of the content around project Monterey with Kit Colbert (VMware), Paul Perez (Dell), and Chris Lamb (NVIDIA). When it’s posted, it’s well worth the time to watch and really helps demystify Monterey.

Thursday, Kit was the day 1 keynote speaker at VMware’s Code Connect conference. He dug deeper into project Monterey today providing more technical detail than the sessions listed above or his blog. It’s worth the watch if you’re interested in how Monterey helps automate the data center at the hardware level. http://vmwarecodeconnect.github.io/CodeConnect2020/

Here are some screen grabs from Kit’s Code Connect keynote that I had tweeted out during the keynote:

That gets us to the current state of content. Next week, October 5, is the fall GPU Tech Conference (GTC). In my personal opinion the keynote is definitely worth watching, if you have a chance be sure to watch it live.

Update Oct 5: One of the things that is free to take in from GTC is the keynote, specifically part 5 of the GTC keynote where BlueField-2 DPU is discussed.

As part of GTC there are a whole lot of great sessions worth watching that may relate to Monterey, provided you have a GTC pass ($100).

The GTC sessions on DPUs I recommend catching are:

The Next Generation of Fully-Integrated Data Centers [A21223]
How to Secure Modern Data Centers [A21238] <– Important
Securing and Accelerating the Enterprise Data Center with Data Processing Units (DPUs) [A21193]
(On demand) DPUs, K8s, and ML: the Future of Compute (Presented by Canoical) [A22573]
Dinner with Strangers – NVIDIA Spectrum Ethernet Switch based Interconnects for AI clusters and VMware Based Virtualized Infrastructure [DWS0CT74]

All that said realize that Monterey is a larger program than just SmartNICs/DPUs (NVIDIA BlueField-2 DPUs to be exact). So I expect we will probably see much more announced as Monterey progresses.

I hope this helps expand your knowledge of VMware’s project Monterey.

May your servers keep running and your data center always be chilled.

BlueField-2, BlueField-2 DPU, Data Processing Unit, DPU, GTC, Mellanox, Mellanox BlueField-2, Monterey, NVIDIA, NVIDIA BlueField-2, NVIDIA BlueFiled-2, Project Monterey, SmartNIC, VMworld

Permanent link to this article: https://www.wondernerd.net/trove-of-vmware-project-monterey-resources/

Jan 14

TV From Across the Pond

By Tony in Miscellaneous
January 14, 2020

This is just a fun quick post. Have you ever been traveling across the pond and wanted to watch your favorite streaming service or news broadcast. You fire up your tablet tap on the app or go to the website and you get… “we’re sorry this content is unavailable out side of the…”

Annoying message when you just want to watch your favorite shows

I have a couple of ways to address this. First I use PiVPN to have a secure tunnel back home. That will work a lot of the time… Though sometimes the lag just isn’t worth it. The other way I watch the latest episode of that new show is to change the appearance of where my device is.

To do that I use a service called Unlocator. (There are others out there to.) They have two different features, one is a VPN service, which can be handy. The other is a DNS service provider that hides your real location. Meaning if you’re like me and want to catch the latest episode of a popular show, just leverage Unlocator and change to a location where you can watch.

Unlocator costs me about $25 USD a year (I just use the DNS service). So when I travel I can get services like Netflix, Hulu, Disney+, and the big 3 national broadcasting services that are here in the States.

You might wonder how to setup Unlocator with your device. It’s actually rather easy.

Login to your Unlocator account to see what your DNS entries should be.
On your device, add those DNS entries to the top of your DNS list. (Don’t worry they have instructions on how to do it for just about any device imaginable.)
Verify Unlocator is working correctly for you by clicking the “check” button in your Unlocator account.
Pull up your streaming service

That’s it, nothing more than that. Now you can watch your favorite programs from across the pond. (Yes, it’s really that simple.) I actually have a Raspberry Pi that I have setup with all my favorite streaming services, a VPN, and Unlocator. Then I just hook it up to the TV in my hotel room and away I go.

Those of you in the States might also wonder if this means you could watch your favorite overseas content on for example the BBC. That would be wrong because you’re supposed to have a TV license to watch BBC programme and the BBC asks you when you load iPlayer if you have one… So I guess the answer is no.

via GIPHY

That is my post for today. Hopefully this helps you catch all those great shows you thought you were missing on your travels.

May your servers keep running and your data center always be chilled.

Disney+, DNS, DNS Masquerading, Hulu, Netflix, OpenVPN, PiVPN, Raspberry Pi, Remote Viewing, TV, Unlocator, VPN

Permanent link to this article: https://www.wondernerd.net/tv-from-across-the-pond/

Jan 08

Farming NVIDIA Jetson Based Thin Clients

By Tony in AI, Deep Learning, GPU, Horizon, HPC, Large Scale Compute, Machine Learning, NVIDIA Jetson
January 8, 2020

In March 2019 NVIDIA released the Jetson Nano, a $99 GPU embedded micro-computer. I have several and have even distributed them to some brilliant youth to see what they come up with. I’ve noticed a couple things about the Nano’s that could have very interesting results on how we process data in organizations and smart cities. Let’s take a look.

The first thing I’ve noticed, Stratodesk has blogged about using Nano’s as thin client systems for VDI with good graphics capabilities (the Nano’s GPU contributing to that). This improves the user experience while keeping cost for desktop endpoints relatively low. Imagine hundreds of Nano thin clients through out the cubes of an organization… End users are getting a great graphics experience at a pretty economical price. Cube’s everywhere are glowing green with Jetson Nano’s as users monitors power up…

In most organizations GPU powered thin clients (or any thin clients for that matter) are used for about 8 hours a day (when users are in the office) and the rest of the time they sit idle, not delivering value to the organization. They may be on or off, just sitting there idling waiting till the next work day roles around to add value to the organization.

Pie chart showing that a thin client is only used 33% of the time and the rest of the time sits idle.

The second thing I’ve noticed, at GTC Silicon Valley 2018 Liqid presented session s8539 on pooling and orchestrating NVIDIA Jetson for AI and deep learning on the edge. Essentially being able to take NVIDIA Jetson’s and compose them on a high speed network in a single chassis as single unified system. You could have up to 24 Jetson’s composed as a single platform, running in your data center.

Interestingly enough, after doing some digging, this also includes the Jetson Nano as one of the Jetson types that could be composed. That creates a pretty spiffy AI/DL platform for the data center or remote location. Pop it in a rack, use the data center cooling and power, to start cranking on those ever vexing business questions (like figuring out who’s the masked singer).

Independently these are very cool options both of which can take advantage of the Jetson Nano. At this point queue the WonderNerd, and his hair brained ideas. Why not bring them together and do a little cube farming IT style? I bounced the idea of composing thin clients into a processing system off of someone I know at Liqid to see if it had wings and well lets start putting the idea together and see what you think.

24 cubes, most idle, which could be brought together as a composeable resource.

The basic idea is (stop me if you’ve heard this VDI by day concept) that we have a bunch of endpoints with NVIDIA GPUs (like Jetson Nano’s) spread through the cubes of an organization. During the day they provide a VDI endpoints to users. When the users go home for the day and the endpoints are sitting idle, we recompose them and turn them into a grid based computing platform (don’t confuse this with NVIDIA GRID, that’s virtualizing GPUs, we are aggregating them). It’s a concept similar to VDI by day and compute by night, only this is end point by day and compute by night.

This isn’t necessarily a new idea. People have been doing this for quite a while. It’s called grid, or distributed, computing. I’ve been doing grid computing for quite a while, volunteering my systems in a program called World Community Grid (WCG) which has been helping researchers solve various world problems like Zika and childhood cancer. If you have some spare compute cycles to share, I recommend participating in the WCG.

The idea behind grid, or distributed computing is that you have a bunch of systems distributed throughout an area. Each system reaches out to a master node to get a bit of work to do, it processes the job and returns the result to the master node. The same bit of work would, in most circumstances, be distributed to three or more systems participating in the grid. The answer in the majority is accepted as the correct result. (Some of the foundations of block chain stem from distributed computing.)

This probably seems straight forward enough, take a bunch of Nano’s and compose them as a GRID platform. This is all well and good if that’s all these were doing, but their primary purpose is as a thin client end point so users can get work done.

You may be asking, how would thin clients change state from an endpoint to part of a composed infrastructure system? Thinking about this, it could be done by leveraging some intelligent programming to detect when thin clients were no longer in use and reallocate them to a distributed composable infrastructure. (Much like what I did with my VDI by day compute by night scripts.) With a bit more intelligence they could have some code that pays attention to usage habits and removes a thin client from a processing system 30 minutes prior to the expected arrival of the person who uses it as a thin client.

You’re probably thinking that I’m forgetting one little thing, networking, no one’s going to run 10GB lines to end users. That’s insane and kills the benefits ($$$) of something like this, besides Nano’s dont have 10Gbps links. 1GB links should work in many situations. What may need to happen is a way to add an M.2 card on the thin client to be used as a store and forward buffer (no the Nano’s don’t currently have one). The micro SD storage slot on the Nano is a consideration, but I’m not sure how durable it would be functioning as a cache. With a buffer it would be possible to fully saturate the link in both directions as material is created and transferred through out the grid. If an M.2 were available, hopefully it would sit on the same bus as the Nano’s GPU thus it should be fast enough to supply the GPU, and with the right caching algorithm minimize the impact on the network and individual nodes.

High speed store and forward storage concept diagram for independent nodes to enable caching of incoming and outgoing data.

With some M.2 instances being composeable as well, it might be possible to create both a local and unified storage space on each endpoint. This would allow processing jobs to function as both dis-aggregated and aggregated processes. This might be an area where VMware vSAN would work, though network speed between Nano’s may be an issue. The data would need to be placed locally first and then propagated to other nodes thus creating unified storage among all the nodes. This might be one method to aggregate and share processed results among the nodes.

Even with local storage, this would still require the up-link’s off the network switches and the switches bus to be pretty beefy. The minimum bandwidth up to support something like this would be two – 10 Gbps ports LAG’ed together and 40 Gbps would probably be preferable.

In most cases it would also be best to limit the size of a composed system to a single switch. In other words if the switch has 24 ports, the maximum system size is 24 nodes. This helps avoid extra hops and network lag plus we don’t want to congest the north bound network links with east west traffic from nodes not on the same switch. Again this would be something that could be addressed with some intelligent programming.

With the networking out of the way, what could a Jetson Nano cluster deliver for an organization. Here are the Nano specs…

GPU	128-core Maxwell
CPU	Quad-core ARM A57 @ 1.43 GHz
Memory	4 GB 64-bit LPDDR4 25.6 GB/s
Storage	microSD (not included)
Video Encode	4K @ 30 \| 4x 1080p @ 30 \| 9x 720p @ 30 (H.264/H.265)
Video Decode	4K @ 60 \| 2x 4K @ 30 \| 8x 1080p @ 30 \| 18x 720p @ 30 (H.264/H.265)
Camera	1x MIPI CSI-2 DPHY lanes
Connectivity	Gigabit Ethernet, M.2 Key E
Display	HDMI 2.0 and eDP 1.4
USB	4x USB 3.0, USB 2.0 Micro-B
Others	GPIO, I²C, I²S, SPI, UART

Let’s extrapolate that out to a single 24 node composed system and see what the total power of it would be…

Component	Single Jetson Nano Capacity	Combined Capacity of 24 Jetson Nano’s
GPU	128-core Maxwell	3072 Maxwell cores
CPU	Quad-core ARM A57 @ 1.43 GHz	96 ARM A57 cores @ 1.43 GHz
Memory	4 GB 64-bit LPDDR4 25.6 GB/s	96 GB LPDDR4 RAM
Connectivity	Gigabit Ethernet, M.2 Key E	24 Gbps of combined network connectivity

At the GPU level, this infrastructure would be equivalent to one NVIDIA M6000 GPU which has 3072 Maxwell cores. At the time of this writing Amazon is selling the M6000 for $1999.00 USD. Using a base cost model a single GPU would win, $1999 (1 x $1999) vs $2400 (24 x $100) for Jetson Nanos. Plus all the added complexity (switch, storage, etc.) and coding (operational state, network segment, etc.) required for a Nano farm.

Thought that’s not a true apples to apples comparison. The M6000 would be used exclusively for workload processing, a composed system with Jetson Nano’s would have a dual function, end point and workload processing. That would mean that 8 hours of the day they serve as end points and the other 16 (ish) hours they are doing processing. Now we could weight the cost of this on the hours used by each application, which would work out to $800 for thin client and $1600 for processing, thought that’s not a fair way to look at it either. The thin clients are required (assuming VDI), they are a sunk cost to the organization. In other words they are already there, they cost a given amount (for the purposes of this blog $100) regardless of if they are used for 8 hours or 24 hours.

That means a better way to approach this cost would be something like, “the data processing team pays for the upgraded switch(s) and the software to control it as a cluster along with any back end equipment needed for a composed system.” Granted this still probably wouldn’t be economical at a small scale of 24 thin-clients… this would need to be much larger deployment. Think about an office of 500 cubes, that would be 20 composed systems when no one is in the office.

500 cubicles. Idle cubes represented by green boxes with "Idle" in them and blue boxes with a person figure in them. — 500 cubicles

That’s 20 extra M6000 GPUs working about 16 hours a day on the organizations problems. That works out to roughly 278 extra processing days a year per composed system, or 2,254 days for 20 composed systems. (5 days a week X 52 weeks a year = 260 days * (2/3 of a day) = 173.333 days + 104 weekends = 278 days.) This works out to about 76% of a composed systems time could go to data processing over a given year. That’s a pretty good result.

At this time my concept probably doesn’t make complete business sense (give it time). The biggest hindrance is someone taking the time to program the logic described. Additionally I’m not sure what the performance degradation would be splitting a compute job across 24 nodes instead of running it on a single processor (there is more latency sending a signal over a TCP/IP network than a few nano-meters of copper). Because of these unknowns, I can’t say this will save you tons of money, in fact I’m not sure what the cost of this at scale would be right now especially if you calculate the cost based on usage or a direct comparison to a dedicated GPU. Still its interesting to consider in large organizations.

This thought process does open up a couple of other interesting opportunities though, where it could prove adventitious.

One is for fractional workload processing. Yes, that thing that all my AI/ML/DL/HPC folks are wanting for their workloads. Let’s say you have a bunch of small jobs that don’t consume a whole M6000 GPU, they only consume, lets say half of the GPU, but you must consume the whole card to run the job. Now lets say you have 1 million of those jobs to run. Lets say each job takes 3 seconds to run. That’s 3 million seconds to run all the jobs. Now lets say I can optimize the operation by splitting it and running it on two systems half the size of the original… that’s 1.5 million seconds to process all those jobs. In other words I’ve gone from 833.33 hours to 416.66 hours. That’s a pretty powerful way to optimize resources to fit the workload. It can also scale up. (In other words 1.5 GPUs instead of 2.)

There is also the possibility to do this programmatically. Which would mean that a function in the program could determine the optimum processing configuration and compose the infrastructure accordingly. This is much further off though.

Fractional GPU processing can be done by using composability proposed by Liqid in their GTC session mentioned above. It can also be done using virtualization technology such as VMware vSphere. My proposal with Nano’s just leverages unused resources in the organization.

Un-optimized utilization consuming a full GPU (left) compared to optimized fractional GPU from composable resources (right). — Fractional GPU workload processing

The second scenario is also interesting, especially with 5G, micro services, containers, smart cities, and many more advancements. Imagine a smart city with hundreds of smart systems processing things in real time, like traffic lights, utilities, etc. Now imagine them functioning as a large unified system which would allow processing power to go where it is needed.

Think about an intersection that is busy at night with hundreds of cars traveling through it every hour. Across town there is another intersection that is only busy during rush hour. And, yet, another intersection that is only busy on the weekends. A composed infrastructure like I have described above could be used to deliver an optimized pool of resources across a smart city. The unused processing resources from one area (purple circle) could be used to enhance near by areas that need more processing power (red circle) during peak times. It becomes a dynamic proactive city that responds to the changing needs of its residents and visitors.

All of these are interesting concepts, they just need someone to build them… maybe I’ll see if I can build a composable Nano infrastructure in my home lab. If you’ve already built this please share with the readers and I, post a link to it in the comments below.

May your servers keep running and your data center always be chilled.

5G, Composable Infrastructure, Distributed Computing, Fractional GPU Processing, Fractional Processing, Fractional Workload Processing, GPU, Grid Computing, Jetson Nano, Liqid, NVIDIA Jetson Nano, NVIDIA M6000, Smart City, Stratodesk, Thin Clients, Under utilized resources, VDI

Permanent link to this article: https://www.wondernerd.net/farming-nvidia-jetson-based-thin-clients/

Nov 12

Dell EMC PowerOne

By Tony in Dell EMC Elect, EMC, Storage, VCE
November 12, 2019

I can finally tell you what I’ve been working on for the last 17 months. Today Dell EMC announced the PowerOne System. This is what I’ve been working on, it hasn’t been a VDI program or anything like that. My work at Dell has been part of a team creating an autonomous, outcome oriented, converged infrastructure.

Dell EMC PowerOne unifies Dell infrastructure components as one outcome oriented, converged infrastructure.

You might be asking — what is a PowerOne System? PowerOne is an autonomous converged infrastructure that is a giant leap forward in the industry and provides outcome oriented results. It is built on all Dell components. That means storage, compute, and networking are all from Dell.

The storage array for the initial release of PowerOne can be either PowerMax 2000 or PowerMax 8000
Workload computing currently runs on the Dell EMC MX7000 with either MX740c or MX840c compute sleds
Dell PowerSwitch networking provides connectivity to all of the components
At the heart of PowerOne resides the PowerOne Controller which directs automated outcome oriented results

What does “automated” and “outcome oriented” mean? It means that the components needed for workloads are brought together with automation and delivered as a declared outcome. This is a key concept that is unique to PowerOne. Let’s get even deeper on this with two examples.

Let’s say you are at a restaurant — you are given a menu of food available from the restaurant. You select your drink, appetizers, entree, and dessert. You don’t have to know how to make any of them. The chefs, bartenders, and wait staff know how to make it all for you and bring it to your table complete and with high quality. PowerOne works in much the same way: you declare the outcome you want, PowerOne then takes that and looks at what it has in inventory, and using stringent best practices, delivers the infrastructure for you to consume.

If you don’t dine out all that often, maybe cars are a better model. Consider driving down the road and you set your cruise control to the speed limit, take your foot off the gas, and go with the flow of traffic. You’ve declared an outcome: you want to go the speed limit. You don’t need to know all the complex operations for making that work, you just push a button. This is the same with PowerOne — you declare the outcome and PowerOne delivers that outcome without you having to know all of the inner workings and best practices to get the result you want.

All this declarative stuff is great, right? How does PowerOne deliver these outcome oriented results? They are delivered with the PowerOne Controller, the heart of PowerOne, that I mentioned above. The PowerOne Controller is a redundant appliance that manages all of the PowerOne infrastructure and delivers and maintains outcomes.

How does it do that? With a lot of cool programs running inside the PowerOne Controller. Remember, the PowerOne Controller is an appliance. When was the last time you bought a refrigerator and asked the sales person what sort of refrigerant it used. You probably haven’t ever asked, I just bought one last week and didn’t even bother to ask.

And you are probably thinking, well how am I supposed to be able to use PowerOne and make it fit into my environment? Great Question, I’m glad you asked! PowerOne has a fully functional API so you can leverage it as part of your data center automation tools. That means, if you want an automation process to stand up a new vSphere cluster whenever a request is approved, or have a script that watches your DevOps environment and adds a new host when resources are constrained, you can do it with PowerOne without needing to know the recipes.

Remember earlier when I talked about how outcome oriented results from PowerOne were like going to a restaurant? The restaurant doesn’t tell you how they make the chicken soup, nor do you have to tell them how to make it, you just declare “I’ll have the chicken soup.” Everything else is done behind the scenes. Same thing is happening with PowerOne, you just declare what you want and PowerOne prepares it for you.

Beyond the PowerOne API as a way to declare outcomes, PowerOne also has a UI that lets you declare outcomes. It’s called the PowerOne Navigator and is built on top of the PowerOne API. You can even try it out on the Dell Demo Center by looking for interactive demo ITD-0315 PowerOne Navigator. The PowerOne Navigator lets you engage in a declarative approach to IT infrastructure using a UI. (Seriously, go check out the demo.)

Dell EMC PowerOne Navigator Overview screen

You’re probably wondering about how all of these outcomes are delivered and made available for consumption. The resources (storage, compute, and network) are logically configured together as a CRG or Cluster Resource Group. This is a logical grouping of infrastructure resources and their settings. You can have a bunch of CRGs.

Once a CRG has been declared, the hosts in it have VMware ESXi automatically installed on them and they are added to a vCenter for management. All of that is done automatically without anything more than declaring an outcome. The IT staff no longer need to know about all the obscure best practices for creating vSphere environments. PowerOne already takes those into account.

Below is the starting screen for creating a CRG. Notice, I declare a name, a version of vSphere, and whether I want to base its design off of any other CRGs I already have. A few more screens and I have a functional CRG with ESXi installed on it.

Dell EMC PowerOne Navigator create CRG dialog

That brings up something that is really cool with PowerOne. You may have noticed here that it’s asking what version of ESXi I want installed on the CRG. PowerOne allows for multiple versions of vSphere in an environment. That means when the organization is ready to move a given workload to a new vSphere version, they can update it without affecting other workloads on other CRGs.

Does that mean each CRG has its own vCenter server instance? No, by default there are only two vCenter instances in PowerOne and they are located in a vSphere Management Cluster. One vCenter appliance is responsible for controlling the vSphere Management Cluster and the other is responsible for all of the other CRGs.

In addition, the vSphere Management Cluster holds the NSX-v components, vROps nodes, and a data protection VM. This leaves some space for additional management workloads to be installed on the vSphere Management Cluster.

There are a lot of other cool things in PowerOne I could share but this is already a fairly long post, so we’ll save those for another time. If there is something particular you’d like to know about PowerOne, let me know in the comments or DM me on twitter at @wonder_nerd.

May your servers keep running and your data center always be chilled.

A special thanks to Bob Percy for reviewing this post prior to publication.

CI, Cluster Resource Group, Converged Infrastructure, CRG, Dell EMC PowerOne, Dell PowerOne, EMC PowerOne, Hands On Lab, HOL, MX7000, MX740C, MX840C, PowerMax, PowerMax 2000, PowerMax 8000, PowerOne, PowerOne Controller, PowerOne Converged Infrastructure, PowerOne Navigator, PowerOne System, vSphere Cluster, vSphere Management Cluster, VxBlock

Permanent link to this article: https://www.wondernerd.net/dell-emc-powerone/

Sep 19

Workspace One Apple iOS 9.3 Enrollment

By Tony in EMC, VMware, WorkSpace One
September 19, 2019

I thought I would do a quick blog post about the enrollment of iOS 9.3 devices into VMware Workspace One (WS1). iOS 9.3 is the last release for several versions of devices. Well there are many iOS devices still out there that use iOS 9.3.x, which means if you are doing mobile device management (MDM) in your organization you may run into something that I ran into this week.

I’m going to say this up front… iOS 9.3 is supported with WS1 9.6. In fact support goes all the way back to iOS 7. (If you are on iOS 7, if at all possible consider upgrading to iOS 9 it provides a lot of great functionality.) And if you are an end user reading this blog because your IT team said your device is to old point them to the VMware documentation…

That said, here’s what I ran into. When enrolling an iOS 9.3 device with the AirWatch Agent. I would make it to step 5 in the list. At which point it would launch a safari web browser to get to the AirWatch MDM site for the organization, so that the device can download a MDM profile. Once it got to the site the progress bard would only progress to about 66% then it would stop responding. If you tried to do anything safari would crash and you would need to restart the enrollment process from the beginning.

This was on a clean iOS device that had been rebooted prior to installing the AirWatch agent. What could be wrong… After verifying it wasn’t an ID10T error with the user the next step was to clear the cache for the browser. To do this, go to settings, scroll down to Safari and tap on it, lastly scroll down to “Clear History and Website Data” and tap on that. This clears the cache and should hopefully fix any issues with stale files that haven’t aged off.

That of course didn’t work for the user. What could be the problem??? What else to try… I had the user try rebooting the iOS device after clearing the Safari cache. This time tings proceeded past step 5 and the device was able to download and install the MDM profile and finished the rest of the enrollment process.

That’s right if you run into an issue with your AirWatch enrollment try clearing the cache and then rebooting the device.

via Gfycat

Yup, its that line from the IT Crowd… “Have you tried turning it off and back on again?”

The interesting thing about all of this is the user was me. I was enrolling an iOS device and this bit me. I thought I would share in case you’ve been bit by this as well.

May your servers keep running and your data center always be chilled.

AirWatch, AirWatch Agent, Apple, Apple Device, Apple IOS 9.3, Enrollment, IOS, iOS 9.3, MDM, Mobile Device Management, Safari, Safri Crash, VMware, Workspace One, WS1

Permanent link to this article: https://www.wondernerd.net/workspace-one-apple-ios-9-3-enrollment/

Aug 26

VMTN5019U The Thrifty Admin, VDI by Day Compute by Night

By Tony in AI, EUC Champions, GPU, NGCA, PowerCLI, vExpert, VMworld
August 26, 2019

These are the slides from my session on VDI by Day Compute By Night slides from my vBrownBag presentation. If you have questions please let me by know by posting in the comments sections or using the contact me page.

Here is my blog detailing my code for VDI by Day Compute by Night. Be sure to visit my github site for the current release of code for this project.

May your servers keep running and your data center always be chilled.

AI, DL, GPU, ML, Tech Talk, vBrownBag, VDI, VDI by Day Compute by Night, vGPU, VMworld

Permanent link to this article: https://www.wondernerd.net/vmtn5019u-the-thrifty-admin/

2012 - 2023 VMware vExpert

2020-2023VMware vExpert - EUC Subprogram

2016-2022 NVIDIA Virtual GPU Community Advisor

2012 Dell TechCenter RockStar

2015 - 2020 VMware EUC Champion

2017 Dell EMC Elect

VMware VCP

The fine print: The views expressed herein are solely those of the author and contributing data feeds, these do not reflect or represent the views of employers, associates, associated business entities, groups, organizations, or any other entity (listed or otherwise). Further more some material on this website is dynamically generated and the author does not control its content and is not liable for said content. Content is property of its respective owners. All rights reserved. All material is provided as is, where is, with no expressed or implied warranty of fitness, usability, or merchantability.

In the interest of full disclosure, I am an employee of Dell EMC. The views expressed herein do not represent the views of Dell Technologies or its subsidiary companies.

New PowerCLI Module – Finding vGPU Profiles

Breaking Down The Command

Exploring the Module

Talk Nerdy to me & Updates at VMworld

Harnessing the Power of Python to Control vGPU Management in VMware vSphere – GTC 2021 Session E32023

Just Here for the Slides

Pythoning it up

AI/ML/DL/HPC run it virtually

The Secret Decoder Ring

Digging Deeper With NVML

Sneak Peak at the Brand New Jetson Nano 2GB

Trove of VMware Project Monterey Resources

TV From Across the Pond

Farming NVIDIA Jetson Based Thin Clients

Dell EMC PowerOne

Workspace One Apple iOS 9.3 Enrollment

VMTN5019U The Thrifty Admin, VDI by Day Compute by Night

Blogroll

Subscribe to Blog via Email

Statements

Meta

This site hosted on DreamHost