Home News Projects Software Videos

Roofline-Based Performance Analysis and Optimization of ROMS: Scalability, Bottlenecks, and Arithmetic Intensity

Tanya Gautam

Abstract

Ocean models, such as the Regional Ocean Modeling System (ROMS), are used to study and predict oceanic behavior, enabling a better understanding of critical phenomena like climate change, ocean circulation, and marine ecosystems. However, these models are computationally expensive due to the complexity of simulating large-scale, complex oceanic processes. With the increased high-resolution instruments and satellite imagery to capture ocean information, models must also simulate at high resolution, implying larger execution times. As a result, they demand significant time and computational resources, which becomes a challenge if the model does not use the resources optimally. Given the substantial computational costs, optimizing the performance of ocean models is crucial to make better use of available resources, reduce execution times, and improve scalability across advanced architectures. The process of optimization begins with a thorough analysis of the model to identify performance bottlenecks, which hinder efficiency and scalability. This report presents a performance analysis of the ROMS ocean model using TAU (Tuning and Analysis Utilities) to pinpoint key inefficiencies. In addition to traditional profiling—identifying hotspots and communication delays — arithmetic intensity (AI) computation, for critical kernel loops to understand the balance between computation and data movement, was carried out. By highlighting low-AI, memory-bound routines, we show how reducing redundant loads and stores (for example, through loop fusion and cache-blocking) can substantially cut DRAM traffic. For one such routine, simple loop fusion reduced its DRAM traffic and yielded a 1.6% reduction in execution time. These findings demonstrate how a roofline-guided methodology can expose low-AI, memory-bound bottlenecks and guide targeted transformations that advance ROMS performance on next-generation HPC systems.

[PDF]