Table of Contents
Introduction
As processor core counts have increased over the past seventeen years, the need to keep all of those cores “fed” with data has grown too. Memory technology improvements and frequency increases have helped there, but another path available to chip designers is to offer more memory channels. Most mainstream platforms have supported dual-channel memory for a long time – from even before multi-core CPUs came on the scene – but in recent years high-end desktop, workstation, and server processors have often featured more memory channels. For example, Intel’s Core X line and AMD’s original Threadripper both supported four while Intel’s Xeon Scalable supported six (per CPU). AMD’s EPYC and Threadripper PRO are near the top currently, with eight memory channels available.
How much impact does having that extra memory bandwidth really make, though? This technology does not improve how quickly any given bit of memory can be accessed – that is governed by the memory frequency (clock speed) and latency (how many clock cycles it takes to fulfill an access request). What adding more channels does is allow more individual pieces of data to be accessed at the same time, and thus increases the total amount of information that can be written to or read from system memory per second. That is why higher numbers of memory channels are usually found on processor platforms that offer more CPU cores – each core needs data to work with, so with more cores you need more data in total in order to keep them all working.
Test Platform and Methodology
Many of the CPU architectures that support high numbers of memory channels are built for server applications, but testing that type of workload is outside our area of expertise. Instead, we are looking today at the impact of memory channels on various workstation applications – especially in the realms of content creation, game development, and rendering. In order to put the most stress we can on the system memory, and to have a wide number of channels to test, we opted to use AMD’s latest Threadripper PRO WX 5000 series of processors. To see if CPU core count is a factor, we tested both the 24- and 64-core variants. Here are the full specifications for our testbed:
Motherboard | Asus Pro WS WRX80E-SAGE SE WIFI (Rev 1) |
Processor (CPU) | AMD Threadripper PRO 5965WX (24 cores) AMD Threadripper PRO 5995WX (64 cores) |
Memory (8 channels) | 8 x Kingston 16GB DDR4-3200 ECC Registered |
Memory (4 channels) | 4 x Kingston 32GB DDR4-3200 ECC Registered |
Memory (2 channels) | 2 x Kingston 64GB DDR4-3200 ECC Registered |
Memory (1 channel) | 1 x Samsung 128GB DDR4-3200 ECC Registered |
Video Card (GPU) | PNY GeForce RTX 4090 XLR8 24GB |
Solid State Drive (SSD) | Samsung 980 PRO 1TB NVMe SSD |
Operating System (OS) | Windows 11 Pro (version 22H2) |
Software / Benchmarks | Adobe Photoshop Adobe Premiere Pro Adobe After Effects PugetBench NeatBench 5 Unreal Engine 4.26 Cinebench R23 V-Ray 5 Benchmark |
We kept the total amount of system memory the same across each different RAM configuration, to ensure that would not affect our results. Each test was run twice, and the results shown in the charts below are the average of the two.
I should also note that this testing was performed by our production qualification team, headed up by Ben Nelson. The data his team provided is what made this article possible.
Content Creation
First up, we have results from a trio of Adobe applications – using our PugetBench test suite – as well as NeatBench:
Across all four of these programs, and both CPUs, we see a steady decrease in performance as the number of memory channels is reduced. It is most pronounced and steady in Premiere Pro and NeatBench, while Photoshop and After Effects are somewhat less affected. There are also a couple interesting things to note:
- The higher core count TR PRO 5995WX is more impacted by reduction in memory channels than the 5965WX. In both Premiere Pro and NeatBench, the 5995WX starts out with better performance than the 5965WX on 8 channels but ends up with worse performance by the time they reach a single channel. Since the 5995WX has more than double the number of cores, it makes perfect sense that it would be more affected by the loss of memory bandwidth.
- The 5995WX was unable to complete two of the benchmarks at all when reduced to one memory channel. Both Photoshop and After Effects could not complete a run of PugetBench in that condition, which probably indicates some sort of instability in certain calculations when the CPU is so starved for memory access. This is a trend we will see continue in the sections below.
Game Development
For game dev, we have results from two common workloads in Unreal Engine: compiling shaders and building lighting:
These aren’t stand-alone benchmark tests, so they aren’t measured with a score – instead, we are looking at how many seconds these tasks took to complete. As such, lower results are better / faster.
In the shader compile results we see the same sort of scaling that we did with Premiere Pro and NeatBench previously: the 5995WX starts out faster with the full set of memory channels, and then ends up slower than the 5965WX when you get down to just two channels. The drop in performance with each step is also the most pronounced here out of all the tests we ran for this article: more than a 50% loss from 8 to 4 channels on the 64-core processor, and that much again from 4 to 2. It also displays the issue we saw before where it fails to finish with a single memory channel. Compiling code appears to be very sensitive to memory bandwidth!
For bake lighting, on the other hand, we see almost no difference in performance across the board. Both CPUs stay within a few percent regardless of the number of memory channels, with the sole exception of the 5995WX once again unable to complete the test with just one active memory channel. Aside from that continuing issue, though, is looks like baking lighting is not very memory intensive.
As an aside, we are not diving into frame rates within Unreal Engine here. Those are much more heavily dependent on the video card, and some cursory tests we ran showed no substantial difference as the memory channel count varied – and in fact, very little difference between the two CPUs (the 24-core 5965WX was slightly faster, by only 2-3%).
Rendering
Our last tests are focused on CPU-based rendering performance, which is a strength of the Threadripper PRO processors because of how many cores they have… but also, largely deprecated these days in favor of much faster rendering times available with GPU-accelerated algorithms.
Before jumping into the analysis, it is worth noting a limitation of the benchmarks we used here: the scenes they test are relatively small, so they aren’t placing a huge amount of data in memory to begin with. That may be why we see almost no performance difference here as memory channels are scaled back, until the 5995WX shows a very low score in CineBench and V-Ray with a single channel. It is very possible that a more real-world test, with a large and complex scene, could see a bigger impact in performance as there would be a lot more data in memory that might need to be moved to and from the CPU over the course of rendering.
Conclusion
In the majority of the tests we ran, reducing the number of memory channels available to the processor – and thus overall memory bandwidth – resulted in a significant drop in performance. Some tests like build lighting (in Unreal Engine) and rendering (with simple scenes) saw little or no impact, but in photo editing, video editing, and VFX workloads we saw about 10-20% loss with 4 memory channels, another 1-30% loss going down to 2 channels, and even more with a single channel. Compiling shaders was worse, with the first drop from 8 -> 4 channels cutting performance in half! And in some of our tests, the 64-core processor failed to finish at all when only one memory channel was populated.
Part of the reason we looked into this was that customers have asked if we would build systems without populating all of the memory channels. Sometimes that request comes from a desire to leave room for future upgrades, other times just to save money. Not only did we find out that performance is negatively impacted when doing so, but we are also unable to fully test the CPU and motherboard if memory channels are not filled. There could be a failure with a DIMM slot on the board or part of the CPU’s memory controller which might go undetected in such circumstances.
Can Threadripper PRO run with only four memory modules?
Technically yes, but in many applications there will be a drop in performance when not using the full eight channels these CPUs can support. Expect a 10 to 50% loss with half the memory channels populated, and worse if you reduce the channel count further. Personally, I can see no reason that would justify handicapping a workstation like this!