Table of Contents
Introduction
Blackmagic's DaVinci Resolve is known for how well it utilizes multiple GPUs to improve performance, but our previous testing found that the the scaling was not nearly as dramatic as many claim. However, as that testing was some of our first in-depth testing of DaVinci Resolve, there is quite a bit we want to expand on that may affect our results.
First, we have recently revamped our DaVinci Resolve testing process to be more in line with realistic work loads. Not only did we add things like OpenFX, we also dramatically increased the number of codecs tested. We have added not only ProRes 4444, but also CinemaDNG, ARRIRAW and different RED compression levels. In addition, we opted to test the RAW footage not only at "Full Res." decode quality but "Half Res." as well in case that alters performance.
Second, since our previous GPU scaling testing, the newer NVIDIA Titan V 12GB GPU has been released which showed some terrific performance gains in DaVinci Resolve when we compared it to a range of GeForce cards a month ago. However, due to the high cost of that card, we will also be including the GTX 1080 Ti which gives absolutely terrific performance for it's cost.
Lastly, we will be looking at three different CPUs and their associated platforms including Core i9, Xeon W, and Dual Xeon SP Processors. This provides a range of not only different raw CPU power, but also different PCI-E configurations.
Test Hardware & Methodology
To see how DaVinci Resolve scales with multiple GPUs across various platforms, we opted to test 1-4 GTX 1080 Ti and Titan V GPUs with Core i9, Xeon W, and Dual Xeon SP platforms:
Test Platforms | |||
Motherboard: | Gigabyte X299 AORUS 7 (rev 1.0) |
Gigabyte MW51-HP0 (Rev. 1.0) |
ASUS WS C621E SAGE |
CPU: | Intel Xeon W-2175 2.5GHz (4.3GHz Turbo) 14 Core |
2x Intel Xeon Gold 6148 2.4GHz (3.7GHz Turbo) 20 Core |
|
RAM: | 8x DDR4-2666 16GB (128GB Total) |
8x DDR4-2666 32GB EC Reg. (256GB Total) | |
Video Card: | 1-4x NVIDIA GeForce GTX 1080 Ti 11GB 1-4x NVIDIA Titan V 12GB |
||
Hard Drive: | Samsung 960 Pro 1TB M.2 PCI-E x4 NVMe SSD | ||
OS: | Windows 10 Pro 64-bit | ||
Software: | DaVinci Resolve 14.3.0.014 |
All of our test footage is downloaded or transcoded from media that is publicly available. This was done so that anyone can repeat our testing in order to both verify our findings and to see how their current computer stacks up to the latest hardware available. To test each type of footage, we used three different "levels" of grading. The lowest level is simply a basic correction using the color wheels plus 4 Power Window nodes that include motion tracking. The next level up is the same adjustments but with the addition of 3 OpenFX nodes: Lens Flare, Tilt-Shift Blur, and Sharpen. The final level has all of the previous nodes plus one TNR node.
Performance was measured in the Color tab using the built-in FPS counter. After playback was started, we waited 15 seconds for the FPS to stabilize then recorded the lowest FPS number over the next 15 seconds. This method allowed us to achieve highly consistent and replicable results.
For all the RAW footage we tested (CinemaDNG, ARRIRAW, and RED), we not only tested with the RAW decode quality set to "Full Res." but we also tested at "Half Res." ("Half Res. Good" for the RED footage). Full resolution decoding should show the largest performance delta between the different GPUs, but we also want to see what kind of FPS increase you might see by running at a lower decode resolution with different CPU and GPU combination.
The footage used in our testing is shown below with links to where you can download it yourself:
Codec | Resolution | FPS | Camera | Clip Name | Source |
ProRes 422 HQ | 3840×2160 | 24 fps | Ursa Mini 4K | City Train Station | Blackmagic Design Production Camera 4K Update |
ProRes 4444 | 3840×2160 | 59.94 fps | Canon C200 | Untitled00024199 | 4K Shooters Canon C200 Raw Footage Workflow |
CinemaDNG | 4608×2592 | 24 fps | Ursa Mini 4K | Interior Office | Blackmagic Design [Direct Download] |
ARRIRAW | 6560×3100 | 23.976 fps | ALEXA 65 | A003C025 (Open Gate spherical) |
ARRI ALEXA Sample Footage |
RED | 3840×2160 (11:1) |
23.976 fps | EPIC DRAGON | A016_C001_02073O_001 | RED Sample R3D Files |
RED | 4096×2304 (7:1) |
29.97 fps | RED ONE MYSTERIUM | A004_C186_011278_001 | RED Sample R3D Files |
RED | 6144×3160 (12:1) |
23.976 fps | EPIC DRAGON | A007_C115_07181B_001 | RED Sample R3D Files |
RED | 6144×3077 (7:1) |
23.976 fps | WEAPON 6K | S005_L001_0220LI_001 | RED Sample R3D Files |
RED | 8192×4096 (12:1) |
23.976 fps | WEAPON 8K S35 | S002_C074_02065Z_001 | RED Sample R3D Files |
RED | 8192×4320 (9:1) |
25 fps | WEAPON 8K S35 | B001_C096_0902AP_001 | RED Sample R3D Files |
RED | 8192×4320 (7:1) |
23.976 fps | EPIC-W 8K S35 | S002_C074_02065Z_001 | RED Sample R3D Files |
DNxHR HQ 8-bit |
3940×2160 | 29.97 fps | Transcoded from RED A004_C186_011278_001 | ||
DNxHR HQ 8-bit |
6144×3160 | 23.976 fps | Transcoded from RED A007_C115_07181B_001 | ||
DNxHR HQ 8-bit |
8192×4320 | 25 fps | Transcoded from RED B001_C096_0902AP_001 |
While this is by no means every codec available, we do feel that this covers a wide range of footage that many users work with on a daily basis. In the future we may cut down on the number of RED clips and replace then with something like XAVC-S or AVCHD but for now we really wanted to see how the different compression levels impact performance.
4K Media – Live Playback FPS (RAW DATA)
[Click Here] to skip ahead to analysis section
4K DNxHR HQ
4K ProRes 422 HQ
4K ProRes 4444
4K RED 11:1 (Full Res.)
4K RED 11:1 (Half Res.)
4K RED 7:1 (Full Res.)
4K RED 7:1 (Half Res.)
4K CinemaDNG (Full Res.)
4K CinemaDNG (Half Res.)
4K Media – Live Playback FPS (Analysis)
Since our 4K testing alone contains over 600 data points across six different codecs, it can be difficult to pull meaningful conclusions from the data. If you tend to use just one of the codecs we tested, we highly recommend looking at just that data but for a more general take on GPU scaling in DaVinci Resolve with each platform we decided to average the results from each type of media.
Starting with relatively simple color grading using the color wheels and 4 Power Windows, there is actually very little to talk about. With this level of grading, we simply were running at full FPS (or very near to it) with nearly every single GPU and CPU combination we tested. If anything, the only thing to point out is that the Dual Xeon Gold 6148 system under-performed by just a little bit but this was entirely due to just the ProRes 4444 test.
Adding 3 OpenFX effects, we start to see a bit of a difference with more GPUs – although interestingly the CPU itself made very little impact on performance. On the GPU side, we saw a decent performance gain going from one GTX 1080 Ti to two, but minimal gains adding a third and fourth GPU. With the Titan V, there was a very small increase in performance going from one GPU up to two, three, and four GPUs, but the difference was only around 2 FPS in total.
One thing we want to point out is that most of our test media is 24-25 FPS and – with the exception of ProRes 422 HQ – we were able to achieve full playback FPS with just two GTX 1080 Ti GPUs or a single Titan V. The one test that has a higher framerate (ProRes 4444 at 59.94 FPS) actually saw pretty decent scaling all the way up to 4 GPUs. So it isn't really that Resolve doesn't scale, but rather that more than two GPUs is not necessary to achieve 24-25 FPS with this level of grading.
Adding TNR, we start to really see some great GPU scaling since we are not hitting full playback FPS nearly as often. Once again, the CPU itself made very little difference, but in this test we saw decent gains with two and three GTX 1080 Ti cards and even a few more FPS with a fourth card. With the Titan V, however, we did hit a bit of a wall after three GPUs since that was often what was necessary to give full playback FPS. A fourth Titan V was useful in some isolated cases but for most users who work with 4K footage it is likely overkill.
6K Media – Live Playback FPS (RAW DATA)
[Click Here] to skip ahead to analysis section
6K DNxHR HQ
6K RED 12:1 (Full Res.)
6K RED 12:1 (Half Res.)
6K RED 7:1 (Full Res.)
6K RED 7:1 (Half Res.)
6K ARRIRAW (Full Res.)
6K ARRIRAW (Half Res.)
6K Media – Live Playback FPS (Analysis)
Our 6K testing is not quite as extensive as our 4K testing, but it still contains over 460 data points across four different codecs which can make it difficult to pull meaningful conclusions from the data. If you tend to use just one of the codecs we tested, we highly recommend looking at just that data but for a more general take on GPU scaling in DaVinci Resolve on each platform we again decided to average the results from each type of media.
Starting with relatively simple color grading using the color wheels and 4 Power Windows, there is not much to discuss. With this level of grading, we simply were running at full FPS (or very near to it) with every single GPU and CPU combination we tested. The only exception was with 6K RED 7:1 media with "Full Res." decode quality where the Xeon W-2175 oddly saw significantly lower performance than the other two CPUs. We are not sure why this is, but we confirmed the result multiple times and for whatever reason, that CPU simply doesn't perform well with that exact codec, compression, and resolution.
Adding 3 OpenFX effects, we start to see a bit of a difference but interestingly the scaling appears to be worse than what we saw with our 4K test media. Once again, the CPU itself made very little impact on performance except with 6K RED 7:1 where the Xeon W-2175 gave lower than expected results. On the GPU side, we saw a decent performance gain going from one GTX 1080 Ti to two, but almost nothing when adding a third or fourth GPU. With the Titan V, the difference was even less as we saw virtually no benefit from using more than a single GPU.
Just like with the 4K results, this doesn't mean that Resolve doesn't scale well, but rather that we are hitting full FPS with just two GTX 1080 Ti GPUs or a single Titan V. Unlike the 4K testing, however, all of our media is 23.976 FPS so there is no higher framerate footage that might show a larger benefit from having more GPU power.
Adding TNR, we see improved GPU scaling up to three cards, but oddly we saw an overall drop in performance when we added a fourth card. This is a very unexpected result, but was remarkably consistent when using ARRIRAW at either decode quality or RED footage with "Full Res." decode quality.
Honestly, we have no idea why this is happening. At first, we thought it may be due to the PEX chip on the Xeon W system that is used to divide 16 PCIe lanes between the third and fourth GPU (since that CPU doesn't have enough lanes to run all four GPUs at full x16), but the dual Xeon system runs all four GPUs at x16 speeds and saw the exact same performance drop. We also thought it may be from a CPU bottleneck, but we didn't see any significant difference between the two single CPU setups (which should be roughly the same in terms of performance) and the Dual Xeon setup which has much more raw CPU horsepower.
8K Media – Live Playback FPS (RAW DATA)
[Click Here] to skip ahead to analysis section
8K DNxHR HQ
8K RED 12:1 (Full Res.)
8K RED 12:1 (Half Res.)
8K RED 9:1 (Full Res.)
8K RED 9:1 (Half Res.)
8K RED 7:1 (Full Res.)
8K RED 7:1 (Half Res.)
8K Media – Live Playback FPS (Analysis)
Once again, if you tend to use just one of the codecs we tested we highly recommend looking at just that data. However, for a more general take on GPU scaling in DaVinci Resolve on each platform we again decided to average the results from each type of media.
Starting with relatively simple color grading, it may appear that there isn't much to discuss but there is actually some very important data that isn't displayed well in the averaged chart above. While we hit full FPS with every CPU and GPU combination when using DNxHR HQ or any of the RED footage at "Half Res." decode quality, with "Full Res." (especially at 9:1 and 12:1) the results were… odd. With these, we saw a large differences in performance between each CPU and a very consistent drop in performance with more than a single GPU. Unfortunately, there wasn't even really a pattern to it. With 8K RED 12:1, the Core i9 7960X performed much better than the other two CPU platforms. However, with 8K RED 9:1 the Dual Xeon 6148 was on top with a single GPU but saw a significant drop in performance as we added more GPUs to the point that it was worse than the other CPUs by the time we got to four GPUs.
Adding 3 OpenFX effects, the results are again a bit odd even though the averaged chart above doesn't really show it very well. In most cases, there was a benefit to using two GTX 1080 Ti GPUs, but we didn't see much with a third or fourth card. Similarly, the Titan V did great as a single GPU, but there was almost no performance increase from using multiple cards.
Once again, the RED 9:1 and 12:1 with "Full Res." decode was where things got weird. With 8K RED 12:1, the Core i9 7960X was again the best performing CPU and we even saw a performance gain with two GTX 1080 Ti or two Titan V GPUs. However, with 8K RED 9:1 the results were all over the place. The Dual Xeon Gold 6148 in particular was very unexpected. With that CPU, we saw an overall great performance gain going from one to two GPUs, then a moderate drop in performance with three GPUs, followed by a significant drop in performance when a fourth GPU was added.
With TNR added, the results get a bit more clean, but still not quite what we expected. Once again, with "Full Res." decode quality on the RED footage we saw at best minimal gains with multiple GPUs and often drops in performance as we added cards. If you stick to "Half Res." or use non-RAW media, however, we mostly saw pretty decent gains with up to three GPUs although there was rarely a benefit to having a fourth card.
Conclusion
We were hoping that our results would end up being relatively straight-forward, but unfortunately, reality has a knack for complicating things. However, after deeply analyzing all 1,500+ data points, there are several interesting conclusions we can draw:
1: The CPU/platform makes very little difference
This will of course not hold true if you really skimp on the CPU, but when using high-end models there was surprisingly little difference in terms of playback performance. We did have some odd results here and there (especially with 8K footage), but overall we saw minimal difference between the Core i9 7960X, Xeon W-2175, and the Dual Xeon Gold 6148 CPUs. The Core i9 and Xeon W CPUs should be roughly equal in terms of raw performance, but even with the Dual Xeon (which has much higher raw CPU performance and more PCI-E lanes) we only saw on average a 1-2 FPS benefit at most. Considering the much higher cost of those CPUs, simply using more or higher-end GPUs is likely to be a more effective way to improve performance for most users.
Different CPUs and motherboards may limit the number of GPUs you can use in your system, however, which is an important consideration to take into account.
2: RED footage at "Full Res." decode quality is… weird
This was not as much of an issue with 4K RED footage, but with 6K and especially 8K RED footage trying to use "Full Res." decode quality resulted in very odd results. Not only did we simply see lower playback FPS compared to using "Half Res." but in many cases using "Full Res." decode resulted in a performance drop when we increased the number of GPUs. It may be that we are hitting a CPU or storage bottleneck, but given the fact that we saw the same thing with the Dual Xeon CPUs and are using a very fast storage drive (3,500 MB/s read) we think this is more of an issue with DaVinci Resolve itself.
3: More GPUs is NOT always faster
With just basic color grading and 4 Power Windows, even a single GTX 1080 Ti was able to give us full playback FPS in almost every case so there is simply no need for multiple GPUs. Adding OpenFX definitely put more load on the GPU(s) which allowed us to see a benefit from up to two GTX 1080 Ti GPUs, although there was still little benefit to having more than a single Titan V.
Adding TNR was really where we started to see the benefit of multiple GPUs. Discounting some of the weird results with RED footage at "Full Res." decode quality, we saw decent scaling with up to three GTX 1080 Ti GPUs and in some cases even saw a benefit with a fourth GTX 1080 Ti. With the Titan V, we saw the biggest benefit going from one to two GPUs but there was still some benefit to having a third Titan V. Adding a fourth card, however, rarely improved performance even though we were not hitting full playback FPS.
So what would we recommend to someone looking for a high-end DaVinci Resolve workstation? For the average professional color grader, the platform (Core i9, Xeon W, Xeon SP) really shouldn't make much of a difference. Because of that, we would recommend using a Core i9 CPU for two reasons. First, it is lower cost then comparable Xeon CPUs which leaves more of your budget open for GPU performance. Second, it is much more common which means that there should be less in the way of software/hardware bugs or other issues. On the GPU side, we would recommend either a pair of GTX 1080 Ti GPUs or a single Titan V. Two GTX 1080 Ti's should be less expensive, but it is a more complicated setup which means it will be more prone to odd performance scaling issues like what we saw with some RED footage when using "Full Res." decode quality.
For a best-of-the-best DaVinci Resolve Workstation, we would go with two or maybe three Titan V GPUs. Even just two Titan V GPUs is slightly faster than four GTX 1080 Ti GPUs and even though the Titan V cards should be a bit more expensive, having just two GPUs opens the door for smaller form factor systems or things like multiple Blackmagic Decklink or RAID PCI-E cards. Again, the platform shouldn't make much of a difference here, although if you do opt to use three Titan V cards you could see a small performance gain with either a Xeon W or Dual Xeon SP setup.
Puget Systems offers a range of powerful and reliable systems that are tailor-made for your unique workflow.