Integrated parallel simulations and visualization for large-scale weather applications.Preeti Malakar
The emergence of the exascale era necessitates development of new techniques to efficiently perform high-performance scientific simulations, online data analysis and on-the- fly visualization. Critical applications like cyclone tracking and earthquake modeling require high-fidelity and high-performance simulations involving large-scale computations and generate huge amounts of data. Faster simulations and simultaneous online data analysis and visualization enable scientists provide real-time guidance to policy makers.
In this thesis, we present a set of techniques for efficient high-fidelity simulations, online data analysis and visualization in environments with varying resource configurations. First, we present a strategy for improving throughput of weather simulations with multiple regions of interest. We propose parallel execution of these nested simulations based on partitioning the 2D process grid into disjoint rectangular regions associated with each subdomain. The process grid partitioning is obtained from a Huffman tree which is constructed from the relative execution times of the subdomains. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. We observe up to 33% gain over the default strategy in weather models.
Second, we propose a processor reallocation heuristic that minimizes data redistribution cost while reallocating processors in the case of dynamic regions of interest. This algorithm is based on hierarchical diffusion approach that uses a novel tree reorganization strategy. We have also developed a parallel data analysis algorithm to detect regions of interest within a domain. This helps improve performance of detailed simulations of multiple weather phenomena like depressions and clouds, thereby increasing the lead time to severe weather phenomena like tornadoes and storm surges. Our method is able to reduce the redistribution time by 25% over a simple partition from scratch method.
We also show that it is important to consider resource constraints like I/O bandwidth, disk space and network bandwidth for continuous simulation and smooth visualization. High simulation rates on modern-day processors combined with high I/O bandwidth can lead to rapid accumulation of data at the simulation site and eventual stalling of simulations. We show that formulating the problem as an optimization problem can determine optimal execution parameters for enabling smooth simulation and visualization. This approach proves beneficial for resource-constrained environments, whereas a naive greedy strategy leads to stalling and disk overflow. Our optimization method provides about 30% higher simulation rate and consumes about 25-50% lesser storage space than a naive greedy approach.
We have then developed an integrated adaptive steering framework, InSt, that analyzes the combined effect of user-driven steering with automatic tuning of application parameters based on resource constraints and the criticality needs of the application to determine the final parameters for the simulations. It is important to allow the climate scientists to steer the ongoing simulation, specially in the case of critical applications. InSt takes into account both the steering inputs of the scientists and the criticality needs of the application.Finally, we have developed algorithms to minimize the lag between the time when the simulation produces an output frame and the time when the frame is visualized. It is important to reduce the lag so that the scientists can get on-the-fly view of the simulation, and concurrently visualize important events in the simulation. We present most-recent, auto-clustering and adaptive algorithms for reducing lag. The lag-reduction algorithms adapt to the available resource parameters and the number of pending frames to be sent to the visualization site by transferring a representative subset of frames. Our adaptive algorithm reduces lag by 72% and provides 37% larger representativeness than the most-recent for slow networks.