Skip to the content.

time series

Project Team

Collaborators and Partners

Introduction

Nearly 900 million people live in low-lying coastal zones around the world and bear the brunt of impacts from more frequent and severe hurricanes and storm surges. Oceanographers simulate ocean current circulation along the coasts to develop early warning systems that save lives and prevent loss and damage to property from coastal hazards. Traditionally, such simulations are conducted using coastal ocean circulation models such as the Regional Ocean Modeling System (ROMS), which usually runs on an HPC cluster with multiple CPU cores. However, the process is time-consuming and energy expensive. While coarse-grained ROMS simulations offer faster alternatives, they sacrifice detail and accuracy, particularly in complex coastal environments.

Recent advances in deep learning and GPU architecture have enabled the development of faster AI (neural network) surrogates. This project introduces an AI surrogate to simulate coastal ocean circulation. Our approach not only accelerates simulations but also incorporates a physics-based constraint (water mass conservation law) to detect and correct inaccurate results, ensuring reliability while minimizing manual intervention. We develop a fully GPU-accelerated workflow, optimizing the model training and inference pipeline on NVIDIA DGX-2 A100 GPUs.

Our current experimental results demonstrate that our AI surrogate reduces the time cost of 12-day forecasting of traditional ROMS simulations from 9,908 seconds (on 512 CPU cores) to 22 seconds (on one A100 GPU), achieving over 450x speedup while maintaining high-quality simulation results. This work contributes to oceanographic modeling by offering a fast, accurate, and physically consistent alternative to traditional simulation models, particularly for real-time forecasting in rapid disaster response.

Overall Workflow

workflow

ROMS and AI Surrogate 12-day Forecast Visualization

u v w zeta

ROMS and AI Surrogate Free Surface Elevation ζ Predictions Comparisons at Random Locations

time series

Speedup Comparison with MPI-Based ROMS

speedup

End-to-end efficiency of integrated workflow with AI surrogate inference results verification and correction.

Training Scalability

scaling

Scalability of AI surrogate training with and without activation checkpointing, using 1 to 32 GPUs. Experiments use 1, 2, 4, and 8 GPUs on a single compute node, while 16 and 32 GPU experiments utilize 2 and 4 compute nodes, respectively.

References

Acknowledgement