Most of our GPU coverage focuses on the consumer side of the business and on game benchmarking, but I promised to examine the compute side of performance back when the Radeon VII launched. With the 5700 XT having debuted recently, we had an opportunity to return to this question with a new GPU architecture from AMD and compare RDNA against GCN.
In fact, the overall compute situation is at an interesting crossroads. AMD has declared that it wishes to be a more serious player in enterprise compute environments but has also said that GCN will continue to exist alongside RDNA in this space. The Radeon VII is a consumer variant of AMD’s MI50 accelerator, with half-speed FP64 support. If you know you need double-precision FP64 compute, for example, the Radeon VII fills that niche in a way that no other GPU in this comparison does.
The Radeon VII has the highest RAM bandwidth and it’s the only GPU in this comparison to offer much in the way of double-precision performance. But while these GPUs have relatively similar on-paper specs, there’s significant variance between them in terms of performance — and the numbers don’t always break the way you think they would.
One of AMD’s major talking points with the 5700 XT is now Navi represents a fundamentally new GPU architecture. The 5700 XT proved itself to be moderately faster than the Vega 64 in our testing on the consumer side of the equation, but we wanted to check the situation in compute as well. Keep in mind, however, that the 5700 XT’s newness also works against us a bit here. Some applications may need to be updated to take full advantage of its capabilities.
Regarding Blender 2.80
Our test results contain data from both Blender 2.80 and the standalone Blender benchmark, 1.0beta2 (released August 2018). Blender 2.80 is a major release for the application, and it contains a number of significant changes. The standalone benchmark is not compatible with Nvidia’s RTX family, which necessitated testing with the latest version of the software. Initially, we tested the Blender 2.80 beta, but then the final version dropped — so we dumped the beta results and retested.
There are significant performance differences between the Blender 1.0beta2 benchmark and 2.80 and one scene, Classroom, does not render properly in the new version. This scene has been dropped from our 2.80 comparisons. Blender allows the user to specify a tile size in pixels to control how much of the scene is worked on at once. Code in the Blender 1.0beta2 benchmark’s Python files indicates that the test uses a tile size of 512×512 (X/Y coordinates) for GPUs and 16×16 for CPUs. Most of the scene files actually contained within the benchmark, however, actually use a tile size of 32×32 by default if loaded within Blender 2.80.
We tested Blender 2.80 in two different modes. First, we tested all compatible scenes using the default tile size those scenes loaded with. This was 16×16 for Barbershop_Interior, and 32×32 for all other scenes. Next, we tested the same renders with a default tile size of 512×512. Up until now, the rule with tile sizes has been that larger sizes were good for GPUs, while smaller sizes were good for CPUs. This appears to have changed somewhat with Blender 2.80. AMD and Nvidia GPUs show very different responses to larger tile sizes, with AMD GPUs accelerating with higher tile sizes and Nvidia GPUs losing performance.
Because the scene files we are testing were created in an older version of Blender, it’s possible that this might be impacting our overall results. We have worked extensively with AMD for several weeks to explore aspects of Blender performance on GCN GPUs. GCN, Pascal, Turing, and RDNA all show a different pattern of results when moving from 32×32 to 512×512, with Turing losing less performance than Pascal and RDNA gaining more performance in most circumstances than GCN.
All of our GPUs benefited substantially from not using a 16×16 tile size for Barbershop_Interior. While this test defaults to 16×16 it does not render very well at that tile size on any GPU.
Troubleshooting the different results we saw in the Blender 1.0Beta2 benchmark versus the Blender 2.80 beta and finally Blender 2.80 final has held up this review for several weeks and we’ve swapped through several AMD drivers while working on it. All of our Blender 2.80 results were, therefore, run using Adrenaline 2019 Edition 19.8.1.
Test Setup and Notes
All GPUs were tested on an Intel Core i7-8086K system using an Asus Prime Z370-A motherboard. The Vega 64, Radeon RX 5700 XT, and Radeon VII were all tested using Adrenalin 2019 Edition 19.7.2 (7/16/2019) for everything but Blender 2.80. All Blender 2.80 tests were run using 19.8.1, not 19.7.2. The Nvidia GeForce GTX 1080 and Gigabyte Aorus RTX 2080 were both tested using Nvidia’s 431.60 Game Ready Driver (7/23/2019).
CompuBench 2.0 runs GPUs through a series of tests intended to measure various aspects of their compute performance. Kishonti, developers of CompuBench, don’t appear to offer any significant breakdown on how they’ve designed their tests, however. Level set simulation may refer to using level sets for the analysis of surfaces and shapes. Catmull-Clark Subdivision is a technique used to create smooth surfaces. N-body simulations are simulations of dynamic particle systems under the influence of forces like gravity. TV-L1 optical flow is an implementation of an optical flow estimation method, used in computer vision.
SPEC Workstation 3.1 contains many of the same workloads as SPECViewPerf, but also has additional GPU compute workloads, which we’ll break out separately. A complete breakdown of the workstation test and its application suite can be found here. SPEC Workstation 3.1 was run in its 4K native test mode. While this test run was not submitted to SPEC for formal publication, our testing of SPEC Workstation 3.1 obeyed the organization’s stated rules for testing, which can be found here.
Nvidia GPUs were always tested with CUDA when CUDA was available.
We’ve cooked up two sets of results for you — a synthetic series of benchmarks, created with SiSoft Sandra and investigating various aspects of how these chips compare, including processing power, memory latency, and internal characteristics, and a wider suite of tests that touch on compute and rendering performance in various applications. Since the SiSoft Sandra 2020 tests are all unique to that application, we’ve opted to break them out into their own slideshow.
The Gigabyte Aorus RTX 2080 results should be read as approximately equivalent to an RTX 2070S. The two GPUs perform nearly identically in consumer workloads and should match each other in workstation as well.
SiSoft Sandra 2020
SiSoft Sandra is a general-purpose system information utility and full-featured performance evaluation suite. While it’s a synthetic test, it’s probably the most full-featured synthetic evaluation utility available, and Adrian Silasi, its developer, has spent decades refining and improving it, adding new features and tests as CPUs and GPUs evolve.
Our SiSoft Sandra-specific results are below. Some of our OpenCL results are a little odd where the 5700 XT is concerned, but according to Adrian, he’s not yet had the chance to optimize code for execution on the 5700 XT. Consider these results to be preliminary — interesting, but perhaps not yet indicative — as far as that GPU is concerned.
Our SiSoft Sandra 2020 benchmarks point largely in the same direction. If you need double-precision floating-point, the Radeon VII is a compute monster. While it’s not clear how many buyers fall into that category, there are certain places, like image processing and high-precision workloads, where the Radeon VII shines.
The RDNA-based Radeon 5700 XT does less to distinguish itself in these tests, but we’re also in contact with Silasi concerning the issues we ran into during testing. Improved support may change some of these results in months ahead.
Now that we’ve addressed Sandra performance, let’s turn to the rest of our benchmark suite. Our other results are included in the slideshow below:
What do these results tell us? A lot of rather interesting things. First of all, RDNA is downright impressive. Keep in mind that we’ve tested this GPU in professional and compute-oriented applications, none of which have been updated or patched to run on it. There are clear signs that this has impacted our benchmark results, including some tests that either wouldn’t run or it ran slowly. Even so, the 5700 XT impresses.
Radeon VII impresses too, but in different ways than the 5700 XT. SiSoft Sandra 2020 shows the advantage this card can bring to double-precision workloads, where it offers far more performance than anything else on the market. AI and machine learning have become much more important of late, but if you’re working in an area where GPU double-precision is key, Radeon VII packs an awful lot of firepower. SiSoft Sandra does include tests that rely on D3D11 rather than OpenCL. But given that OpenCL is the chief competitor to CUDA, I opted to stick with it in all cases save for the memory latency tests, which globally showed lower latencies for all GPUs when D3D was used compared with OpenCL.
AMD has previously said that it intends to keep GCN in-market for compute, with Navi oriented towards the consumer market, but there’s no indication that the firm intends to continue evolving GCN on a separate trajectory from RDNA. The more likely meaning of this is that GCN won’t be replaced at the top of the compute market until Big Navi is ready at some point in 2020. Based on what we’ve seen, there’s a lot to be excited about on that front. There are already applications where RDNA is significantly faster than Radeon VII, despite the vast difference between the cards in terms of double-precision capability, RAM bandwidth, and memory capacity.
Blender 2.80 presents an interesting series of comparisons between RDNA, GCN, and CUDA. Using higher tile sizes has an enormous impact on GPU performance, but whether that difference is good or bad depends on which brand of GPU you use and which architectural family it belongs to. Pascal and Turing GPUs performed better with smaller tile sizes, while GCN GPUs performed better with larger ones. The 512×512 tile size was better in total for all GPUs, but only because it improved the total rendering time on Barbershop_Interior by more than it harmed the render time of every other scene for Turing and Pascal GPUs. The RTX 2080 was the fastest GPU in our Blender benchmarks, but the 5700 XT put up excellent performance results overall.
I do not want to make global pronouncements about Blender 2.80 settings; I am not a 3D rendering expert. These test results suggest that Blender performs better with larger tile settings on AMD GPUs but that smaller tile settings may produce better results for Nvidia GPUs. In the past, both AMD and Nvidia GPUs have benefited from larger tile sizes. This pattern could also be linked to the specific scenes in question, however. If you run Blender, I suggest experimenting with different scenes and tile sizes.
Ultimately, what these results suggest is that there’s more variation in GPU performance in some of these professional markets than we might expect for gaming. There are specific tests where the 5700 XT is markedly faster than the RTX 2080 or Radeon VII and other tests where it falls sharply behind them. OpenCL driver immaturity may account for some of this, but we see flashes of brilliance in these performance figures. The Radeon VII’s double-precision performance put it in a class of its own in certain respects, but the Radeon RX 5700 XT is a far less expensive and quieter card. Depending on what your target application is, AMD’s new $400 GPU might be the best choice on the market. In other scenarios, both the Radeon VII and the RTX 2080 make specific and particular claim to being the fastest card available.
Feature image is the final render of the Benchmark_Pavilion scene included in the Blender 1.02beta standalone benchmark.