PNY Blog

AMBER 24 NVIDIA GPU Benchmarks

Written by PNY Pro | Sat, Aug 24, 2024 @ 05:00 PM

Quick AMBER GPU Benchmark Takeaways

NVIDIA Ada Lovelace architecture GPUs outperform all Ampere Generation GPUs. Providing up to 2 times the performance of the previous generation without requiring additional power brings incredible value to RTX Ada Generation GPUs.

NVIDIA RTX™ 6000 Ada Generation offers exceptional performance with its larger memory capacity of 48GB and is multi-GPU scalable. For the larger simulations, such as STMV Production NPT 4fs, the high-speed memory, memory capacity, and GPU clock speed play a large factor in performance. NVIDIA RTX 6000 Ada is the clear performance leader.

NVIDIA RTX 5000 Ada Generation and RTX 4500 Ada Generation perform well above last generation's flagship RTX A6000. These might be the new best GPUs for AMBER with great price to performance ratios.

For smaller simulations, the RTX 5000 Ada Generation delivers exceptional performance.

Benchmark Hardware and Specifications

GPUs Benchmarked

NVIDIA RTX Ada Generation GPU

NVIDIA RTX Ampere GPU

NVIDIA RTX 6000 Ada

NVIDIA RTX A6000

NVIDIA RTX 5000 Ada

NVIDIA RTX A5500

NVIDIA RTX 4500 Ada

NVIDIA RTX A5000

 

NVIDIA RTX A4500

 

NVIDIA RTX A4000

Exxact System Used for Benchmarks

System SKU:                                TS4-173535991

Processor / Count:                      2x AMD EPYC 7552

Total Logical Cores:                     96

Memory:                                     512GB DDR4 ECC

Storage:                                      2.84TB NVMe SSD

OS:                                             Centos 7

CUDA Version:                             12.3

AMBER Version:                           24

*All benchmarks were performed using a single GPU configuration using Amber 24 & AmberTools 24 on NVIDIA® CUDA® 12.3 which could explain the slight increase in performance from Amber 22.

AMBER 24 Background & Hardware Recommendations

AMBER consists of several different software packages with the molecular dynamics engine PMEMD as the most compute-intensive and the engine we want to optimize the most. This consists of single CPU (pmemd), multi-CPU (pmemd.MPI), single-GPU (pmemd.cuda), and multi-GPU (pmemd.cuda.MPI) versions. Traditionally, MD simulations are executed on CPUs. However, the increased use of GPUs and native support to run AMBER MD simulations on CUDA have made GPUs the most logical choice for speed and cost efficiency.

Most AMBER simulations can fit on a single GPU and run strictly on CUDA, thus the CPU, CPU memory (RAM), and storage speed have little to no influence on simulation throughput performance. Running simulations on a single GPU means that parallelizing multi-GPUs on a single calculation won’t incur much speed up. To fully utilize a multi-GPU or multi-node deployment is to run multiple independent AMBER simulations simultaneously on multiple GPUs in the same node or on different nodes.

Hardware Recommendation

Our top 3 GPU recommendations for running AMBER and our reasonings:

  • For cost-effective parallel computing, the RTX 5000 Ada or the RTX 4500 Ada offers A-tier and B-tier performance for much lower cost compared with the RTX 6000 Ada. The additional cost of the RTX 6000 Ada stems from the better performance and larger memory, which won’t be utilized in most AMBER calculations. The extra cost can be allocated to more GPUs and thus more calculations running in parallel. A deployment with 8x RTX 4500 Ada GPUs is similar in price to a deployment with 4x RTX 6000 Ada GPUs but can drastically parallelize your workflow.
  • For peak throughput and parallel computing, the RTX 6000 Ada GPU delivers S-tier performance and allows deployments to slot 4x GPUs in a 2U node or 8x GPUs in a 4U node.

Our CPU & Memory Recommendation:

  • There is no need to overspend on a CPU since it will not run the calculations. The bare minimum would be to allocate a CPU core for every GPU in the system. Additional GPUs require dual CPUs for additional PCIe lanes.
  • Recommended RAM would be 32GB per GPU. You can get by with 16GB of RAM per GPU as well.

Conclusion

Not all use cases are the same and AMBER is most likely not the only application used in your research. At Exxact Corp., we strive to provide the resources to configure the best custom system fit for you.

Since AMBER’s performance is not highly affected by the different setups, you may benefit from optimizing your system to other more selective application requirements that you may also use. Applications like GROMACS or NAMD can benefit from additional cores or higher-end CPUs and can be a tradeoff that can benefit other workflows.