Unlocking Extreme Performance: OpenRADIOSS Scalability on CloudHPC with AMD EPYC Processors

Published by rupole1185 on

For engineers and researchers relying on explicit dynamic finite element analysis, OpenRADIOSS is a powerful and increasingly popular open-source software choice. But how well does such complex software scale in a High-Performance Computing (HPC) environment, especially when leveraging the elasticity of the cloud?

We dive into recent tests conducted on cloudhpc.cloud over OpenRADIOSS v20240710 to examine OpenRADIOSS scalability HPC performance on cutting-edge AMD EPYC processors. The results offer valuable insights for anyone looking to optimize their simulation workflows.

The Test Setup: Benchmarking OpenRADIOSS on CloudHPC

The benchmarks were performed on cloudhpc.cloud, utilizing two generations of AMD EPYC processors: the established Milan (3.3GHz) and the newer, high-frequency Turin (4.1 GHz). Various configurations were tested, ranging from 16 to 96 cores, and critically, assessing the impact of using physical cores versus hyperthreading.

The benchmark model chosen for these tests was the 1 Million Element Neon model, a standard benchmark found on the OpenRADIOSS wiki. This model, with its 1 million finite elements, is particularly well-suited for testing single-node or low-node cluster performance, making it ideal for the core counts explored here.

Analyzing the Results: Peak Performance Meets Efficiency

The comprehensive results, which can be seen in the table below, highlight critical insights into maximizing OpenRADIOSS performance in a cloud HPC environment.

Key Insights from OpenRADIOSS Scalability on CloudHPC

Analyzing the data, several critical findings emerge that are paramount for understanding and optimizing OpenRADIOSS scalability HPC:

1. AMD EPYC Turin Delivers a Significant Leap in Performance

One of the most striking observations is the substantial performance uplift offered by the AMD EPYC Turin processors compared to the Milan generation.

  • Direct Comparison: When comparing similarly configured instances, Turin consistently runs computations significantly faster. For instance, the hypercpu-32 (Turin, 16 cores with hyperthreading) completed the benchmark in 1.36 hours, nearly twice as fast as the highcpu-32 (Milan, 16 cores with hyperthreading), which took 2.68 hours. This translates to an impressive 1.97x speedup for Turin at this core count.
  • Scaling Up: The trend continues when examining the 32-core configurations without hyperthreading. The hypercore-32 (Turin, 32 cores) finished in a remarkable 0.68 hours, while the highcore-32 (Milan, 32 cores) took 1.71 hours. This represents a ~2.5x raw speed improvement directly attributable to the Turin architecture.

This demonstrates that for compute-intensive tasks like OpenRADIOSS simulations, upgrading to newer processor generations like AMD EPYC Turin can dramatically reduce simulation times.

2. Physical Cores Outperform Hyperthreading for OpenRADIOSS

The results strongly suggest that OpenRADIOSS thrives when leveraging dedicated physical cores rather than logical cores provided by hyperthreading.

  • Milan Example: Consider the Milan processor. Using 16 physical cores with hyperthreading (highcpu-32, 32 logical threads) resulted in a run time of 2.68 hours. However, utilizing 32 physical cores without hyperthreading (highcore-32, 32 physical threads) brought down the time to 1.71 hours. Even though the logical core count was similar, the use of more physical cores yielded a significant boost (approx. 57% speedup).
  • Turin Example: The pattern is even more pronounced with Turin. On 16 physical cores with hyperthreading (hypercpu-32), the simulation took 1.36 hours. But on 32 physical cores without hyperthreading (hypercore-32), the time dropped to 0.68 hours – exactly half the time. This indicates a near-perfect linear scaling when moving from 16 physical cores to 32 physical cores, assuming hyperthreading was limiting the 16-core configuration.

This insight is crucial: when configuring your cloud HPC instances for OpenRADIOSS, prioritize instances that allow you to maximize dedicated physical cores over relying on hyperthreading to increase thread count.

3. Optimal Workload Per Core is Critical for Scalability

For OpenRADIOSS scalability HPC, simply throwing more cores at a problem does not guarantee proportional speedup. The benchmark highlights the importance of maintaining an adequate “cells per core” ratio to avoid diminishing returns from communication overheads.

  • The 20k-30k Cells/Core Guideline: The 1M Element Neon model has 1,000,000 elements.
    • On the hypercore-32 instance (32 cores), this translates to approximately 31,250 elements per core. This configuration achieved a speedup of 3.95 and was the most cost-efficient in terms of vCPU / Hours.
    • When scaling up to hypercore-96 (96 cores), the elements per core dropped to about 10,416. While the run time decreased further to 0.55 hours (a speedup of 4.84), the efficiency of scaling significantly reduced compared to the hypercore-32 case. Doubling from 16 to 32 cores (on Turin) halved the time, but going from 32 to 96 cores (a 3x increase in cores) only reduced the time from 0.68h to 0.55h – not a proportional decrease.

This exemplifies the general rule: for optimal OpenRADIOSS scalability HPC, aim for at least 20,000 to 30,000 cells (or more) per core. Below this threshold, the overhead associated with inter-processor communication and synchronization can begin to outweigh the benefits of additional computational resources.

The Cost-Efficiency Angle: vCPU / Hours

The vCPU / Hours metric is essential for cloud computing, representing the total compute resources consumed. Lower values signify greater cost efficiency.

  • Turin’s Cost-Effectiveness: Across the board, AMD EPYC Turin configurations show significantly lower vCPU / Hours values compared to Milan. For example, hypercore-32 (Turin) consumes 21.72 vCPU / Hours, drastically lower than highcore-32 (Milan) at 54.60. This means you’re getting more computational work done per unit of virtual CPU time, directly translating to lower costs.
  • Sweet Spot for This Benchmark: For the 1M-element model, the hypercore-32 configuration (Turin, 32 physical cores) proved to be the most cost-efficient, having the lowest vCPU / Hours at 21.72. While hypercore-96 was faster in absolute terms, its vCPU / Hours jumped to 53.23, indicating that the additional cores were not as economically efficient for this specific problem size.

Conclusion: Maximizing OpenRADIOSS Performance in the Cloud

These tests on cloudhpc.cloud provide clear guidance for users seeking to optimize their OpenRADIOSS simulations:

  1. Embrace Modern Processors: Leverage the power of newer AMD EPYC Turin generation processors. They offer significant speedups (nearly 2x or more compared to Milan) and better cost efficiency.
  2. Prioritize Physical Cores: When configuring cloud instances, favor allocating dedicated physical cores over relying on hyperthreading for OpenRADIOSS workloads. This leads to better performance gains.
  3. Mind Your Cell-to-Core Ratio: Ensure your simulation model provides sufficient computational work per core (aim for 20k-30k+ elements per core) to achieve efficient OpenRADIOSS scalability HPC. This prevents diminishing returns and helps maintain cost-effectiveness.

By strategically applying these insights, engineers and researchers can significantly accelerate their OpenRADIOSS simulations on cloud HPC platforms, reducing turnaround times and optimizing resource utilization. The path to faster, more efficient explicit dynamic FEA simulations is clearer than ever.


CloudHPC is a HPC provider to run engineering simulations on the cloud. CloudHPC provides from 1 to 224 vCPUs for each process in several configuration of HPC infrastructure - both multi-thread and multi-core. Current software ranges includes several CAE, CFD, FEA, FEM software among which OpenFOAM, FDS, Blender and several others.

New users benefit of a FREE trial of 300 vCPU/Hours to be used on the platform in order to test the platform, all each features and verify if it is suitable for their needs


Categories: OpenRadioss

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *