My NVIDIA GRID 5.0 Testing

Bookmark and Share

In my previous post I covered some of what is new with NVIDIA GRID 5.0 and the NVIDIA Pascal cards. In this post I’m going to cover some of the testing I’ve done with GRID 5.0 and the NVIDIA Tesla P4 GPU.

NVIDIA was nice enough to provide the NVIDIA GRID Community Advisors (NGCA) access to P4 GPUs and beta candidates of GRID 5.0 for testing in our home labs. This is something I couldn’t pass up. Which means I had to stand up a home lab.

Tesla P4 GPUIt’s interesting, this cutting edge frontier is one of the few spaces that still requires you to have physical equipment to do testing. I can’t go out and run this in a cloud somewhere and test it. So for the last month, my wonderful wife has tolerated the sound of servers running in our basement while I put GRID 5.0 through its paces.

That said let’s talk about my lab setup. Data for the post actually comes from two different labs, one was a work lab that has since been purposed. Currently I just have a home lab. (Skip ahead 5 paragraphs if you aren’t interested in my home lab hardware.) [add anchor link to skip ahead]

What am I running to do all this GRID 5.0 testing… maybe something like some Dell R730s loaded with NVIDIA P40s or P100s? Not really, my home lab, like most folks home labs, are ebay specials. That said it’s worth noting this entire setup CAN NOT be found on any HCL list for VMware or NVIDIA.

My Home lab consists of an R610 running a management environment (VSA, Jump Box, AD, connection server, Security Server, NVIDIA License Servers, etc.). It has dual E5620 Intel quad core processors. It also has 24GB of RAM and 5-146GB SAS drives in a RAID 5 configuration. All of the VMs are running on local storage. It works as a system just to support management.

This is connected to the world using a 1GB ZyXEL 16 port switch and an old D-Link router I had sitting around. Standard networking setup for 1GB ports, nothing really special to tell you about.

The other system in my arsenal is a Cisco C240-M3. It’s running the Intel E5-2640 procs at 2.50GHz with 6 cores each. The system has 64GB of memory in it and I have it loaded with 2-74GB SAS drives in a RAID1 for my boot volume and 3-146GB SAS drives in a RAID 5 for my storage volume. The nice thing about this sever is that it supports the NVIDIA K1 and K2 cards should I need them for testing. I picked this up on ebay for about $500  + drives (http://www.ebay.com/itm/Cisco-UCS-C240-M3S-v02-Server-2x-E5-2640-2-50-GHz-6-Core-64GB-Dual-PS-No-HDD/262840746028).

If you hadn’t guessed by now my testing was done using VMware Horizon 7.1 (build-5170113). My vCenter is version 6.5 (build 5705665) and the ESXi hosts are at version 6.5 (C240 is build 5310538 and the R610 is build 4887370).

This is where the standard stuff ends and the fun begins…

The P4 GPUs that NVIDIA was nice enough to provide the NGCA members with went in my C240-M3 host. I’m going to save the install and setup of the GPU for another post. The C240-M3 host only runs my test VMs so that I can avoid artifacts caused by other VMs that aren’t part of what I’m testing. With the P4 installed and configured in the C240-M3 I built some CentOS 7 VMs. NVIDIA X Server Settings Screen

To test my VMs I used GFXBench and Unigine-Heaven 4.0 for both I used the Linux versions of the software. I chose GFXBench because of its testing methodology, how it has test for several different GPU factors, and because they aren’t all in a single test. I also chose to test with Unigine because it’s what everyone else tests with and I want to make this information as relevant to as many of you as possible.

For my basic tests I’m kept the RAM in each VM at 8GB and vCPUs at 4 with a native screen resolution of 1920×1080 (16:9)@29hz.. (See NVIDIA X Server Settings screen shot.)

The table below shows the GFXBench results for each of its tests for each vGPU profile in VMware Horizon.

GFXBench test results*

Test P4-8Q P4-4Q P4-2Q P4-1Q
Car Chase 6983.29 Frames
(118.161 FPS)
6940.41 Frames
(117.435 FPS)
6953.41 Frames
(117.655 FPS)
6997.82 Frames
(118.406 FPS)
1080p Car Chase Off screen 9916.48 Frames
(167.791 FPS)
9924.63 Frames
(167.93 FPS)
9886.87 Frames
(167.292 FPS)
9935.8 Frames
(168.118 FPS)
1440p Manhattan 3.1.1 Off screen 8828.87 Frames
(142.401 FPS)
8796.17 Frames
(141.874 FPS)
8801.01 Frames
(141.952 FPS)
8802.01 Frames
(141.968 FPS)
Manhattan 3.1 11092.4 Frames
(178.91 FPS)
10915.6 Frames
(176.058 FPS)
11035.8 Frames
(177.997 FPS)
11068.1 Frames
(178.518 FPS)
1080p Manhattan 3.1 Off screen 13633. Frames
(219.889 FPS)
13623.2 Frames
(219.73 FPS)
13476.3 Frames
(217.359 FPS)
13559.1 Frames
(218.696 FPS)
Manhattan 12076.2 Frames
(194.778 FPS)
11640.7 Frames
(187.753 FPS)
11585.2 Frames
(186.858 FPS)
11774.7 Frames
(189.914 FPS)
1080p Manhattan Off screen 14835.3 Frames
(239.279 FPS)
14671.1 Frames
(236.63 FPS)
14473.8 Frames
(233.449 FPS)
14539.1 Frames
(234.502 FPS)
T-Rex 25533.7 Frames
(455.959 FPS)
24787.8 Frames
(442.639 FPS)
26636.3 Frames
(476.136 FPS)
24805.8 Frames
(442.96 FPS)
1080p T-Rex Off screen 42027 Frames
(750.482FPS)
41957.8 Frames
(749.246 FPS)
40691.8 Frames
(726.64 FPS)
41538.5 Frames
(741.759 FPS)
Tessellation 21398.3 Frames
(713.276 FPS)
21415.9 Frames
(713.862 FPS)
21883.7 Frames
(729.457 FPS)
22267.8 Frames
(742.259 FPS)
1080p Tessellation Off screen 88862.3 Frames
(1481.04 FPS)
88410.1 Frames
(1473.5 FPS)
88346.9 Frames
(1472.45 FPS)
88181 Frames
(1469.68 FPS)
ALU 2 18045 Frames
(601.5 FPS)
17631.8 Frames
(587.726 FPS)
17716.9 Frames
(590.564 FPS)
17850.1 Frames
(595.002 FPS)
1080p ALU 2 Off screen 62342.9 Frames
(1039.05 FPS)
62537.5 Frames
(1042.29 FPS)
62582.1 Frames
(1043.04 FPS)
62643.6 Frames
(1044.06 FPS)
Driver Overhead 2 2450.51 Frames
(81.6837 FPS)
 2679.11Frames
(89.3036 FPS)
2492 Frames
(83.0667 FPS)
2600.05 Frames
(86.6682 FPS)
1080p Driver Overhead 2 Off screen 5102.23 Frames
(85.0372 FPS)
5589.44 Frames
(93.1574 FPS)
5194.05 Frames
(86.5675 FPS)
5368.02 Frames
(89.4669 FPS)
Texturing 96098 MTexel/s
(63.5544 FPS)
100066 MTexel/s
(64.4044 FPS)
100061 MTexel/s
(64.9113 FPS)
100385 MTexel/s
(65.2536 FPS)
1080p Texturing Off screen 99278 MTexel/s
(95.8384 FPS)
99665 MTexel/s
(95.9874 FPS)
99797 MTexel/s
(98.0892 FPS)
99204 MTexel/s
(95.8811 FPS)
Render Quality 4541.54 mB PSNR
(866.644 FPS)
4541.54 mB PSNR
(873.211 FPS)
4541.54 mB PSNR
(1023.98 FPS)
4541.54 mB PSNR
(977.356 FPS)
Render Quality (High Precision) 4541.54 mB PSNR
(887.633 FPS)
4541.54 mB PSNR
(900.2 FPS)
4541.54 mB PSNR
(1059.25 FPS)
4541.54 mB PSNR
(1008.66 FPS)

 

Unigine Heaven was run with the settings defined in the table below.

Unigine Heaven 4.0 (Basic Edition) Settings

Preset Custom
API OpenGL (grayed out option)
Quality High
Tessellation Normal
Stereo 3D Disabled
Mulit-monitor Disabled
Anti-aliasing X2
Full Screen True
Resolution System

 

Testing results from Unigine Heaven

Unigine Heaven test results*

P4-8Q P4-4Q P4-2Q P4-1Q
FPS 28.3 28.2 28.1 28.2
Score 713 711 709 711
Min FPS 7.2 12.2 12.5 11.3
Max FPS 41.9 44.2 45.5 42.6
Mode 1920×1080 2xAA fullscreen 1920×1080 2xAA fullscreen 1920×1080 2xAA fullscreen 1920×1080 2xAA fullscreen

The above are the results of my testing. You probably noticed the asterisks (*) on the results. This is my caveat on these results. These aren’t the results you want to rely on for a production environment. I ran these tests once per profile, on non-HCL hardware, in a non-optimized configuration. These results may also be impacted by the fact that no other VMs were running on this host and thus consuming resources during the test. Your individual results may vary significantly. Please consider my test results as one point of data and not a complete answer to how a similar configuration will function in your environment. In short your mileage may vary.   

At the beginning of this blog I mentioned I had two labs I was using. Up to know you have heard about my P4 testing. As some of you may know I presented a session at the GPU Tech Conference with my good friend Trey Johnson on Getting started with GPUs for Linux Virtual Desktops on VMware Horizon. I ran those tests on a work lab environment. The material for that session was run on Cisco C240-M4s with the NVIDIA M60 GPUs and GRID 4.

I made one mistake that I am regretting, before the lab was repurposed I forgot to capture the full set of results from my testing. All I have are the maximum and minimums that were discussed during the session. At the same time the results do provide a good set of comparison points. In the table below are the tests results from the GTC session showing the highest and lowest tests results.

M60 with GRID 4 GFXBench Test Results (incomplete)*

Test M60-8Q M60-4Q
Texturing 44.6732 FPS 44.8432 FPS
Driver Overhead 2 61.3333 FPS 61.5149 FPS
1080p Texturing Off screen 90.7743 FPS 98.2536 FPS
1080p Tessellation Off screen 1212.87 FPS 1212.62 FPS

The same bit as above with the asterisks (*), these are single pass results from a non-optimized environment, your results may vary significantly.

These results correspond similarly to the results from the P4 GPU tests. You can see that the highest and lowest results for both the P4 GPU and the M60 are the same tests. I’ve put the relevant results in a side by side table for comparison below.

NVIDIA P4 GPU compared to M60 GPU testing with GFXBench (incomplete)*

Test P4-8Q (GRID 5) M60-8Q (GRID 4) P4-4Q (GRID 5) M60-4Q (GRID 4)
Texturing 63.5544 FPS 44.6732 FPS 64.4044 FPS 44.8432 FPS
Driver Overhead 2 81.6837 FPS 61.3333 FPS 89.3036 FPS 61.5149 FPS
1080p Texturing Off screen 95.8384 FPS 90.7743 FPS 95.9874 FPS 98.2536 FPS
1080p Tessellation Off screen 1481.04 FPS 1212.87 FPS 1473.5 FPS 1212.62 FPS

You might be getting tired of this by now… same bit as above with the asterisks (*), these are single pass results from non-optimized environments, your results may vary significantly.

As you can see from above the P4 exceeds the M60 in GFXBench tests in all but one instance (underlined in the table above). Thus showing comparable performance in single pass, non-optimized tests, between the NVIDIA P4 and NVIDIA M60 GPUs.

To put this in perspective NVIDIA basically provided half an M60 (power, slot space, etc.) in the P4 and met or exceded vGPU performance. Now if we think about what that means for servers… you can put GPUs in severs for EUC deployments that you couldn’t before (consult vendor documentation for limits and compatibility). This  makes it nice, when for instance you need to upgrade some lower end applications (for example Microsoft Office, Windows 10 (ok that’s an operating system you caught me), and the like) that take advantage of GPUs and the host for those desktops is a year or two old. Instead of a rip and replace add P4’s or P40’s to the servers (depending on support) and away you go.

I can’t remember who said it at the GPU Tech Conference this year but it really stood out to me, it went something like this. “We’ve entered a new age in the computer industry, an age where servers won’t be sold without a GPU for processing.” In my opinion, this latest release of GRID 5.0 and Pascal GPUs make vGPU based EUC accessible, for all but a couple of corner cases. Going forward adding GPUs should be a requirement for EUC deployments.

I hope this blog post was helpful. If you would like to find out more be sure to read these other great posts about NVIDIA GRID 5.0 from other NGCA members:

Permanent link to this article: https://www.wondernerd.net/my-nvidia-grid-5-0-testing/