Benchmarks¶
To showcase the dataset and the associated benchmarking library, we provide a set of simple baselines time-boxed to 12 hours on a single NVIDIA H100 to demonstrate the effectiveness of naive approaches on these challenging problems and motivate the development of more sophisticated approaches. These baselines are trained on the forward problem - predicting the next snapshot of a given simulation from a short history of 4 time-steps. The models used here are the Fourier Neural Operator, Tucker-Factorized FNO, U-net and a modernized U-net using ConvNext blocks. The neural operator models are implemented using the neuraloperator library.
We emphasize that these settings are not selected to explore peak performance of modern machine learning, but rather that they reflect reasonable compute budgets and off-the-shelf choices that might be selected by a domain scientist exploring machine learning for their problems. Therefore we focus on popular models using settings that are either defaults or commonly tuned.
Test results¶
Dataset | FNO | TFNO | U-net | CNextU-net |
---|---|---|---|---|
acoustic_scattering_maze |
0.5062 | 0.5057 | 0.0351 | 0.0153 |
active_matter |
0.3691 | 0.3598 | 0.2489 | 0.1034 |
convective_envelope_rsg |
0.0269 | 0.0283 | 0.0555 | 0.0799 |
euler_multi_quadrants_periodicBC |
0.4081 | 0.4163 | 0.1834 | 0.1531 |
gray_scott_reaction_diffusion |
0.1365 | 0.3633 | 0.2252 | 0.1761 |
helmholtz_staircase |
0.00046 | 0.00346 | 0.01931 | 0.02758 |
MHD_64 |
0.3605 | 0.3561 | 0.1798 | 0.1633 |
planetswe |
0.1727 | 0.0853 | 0.3620 | 0.3724 |
post_neutron_star_merger |
0.3866 | 0.3793 | - | - |
rayleigh_benard |
0.8395 | 0.6566 | 1.4860 | 0.6699 |
rayleigh_taylor_instability (At = 0.25) |
>10 | >10 | >10 | >10 |
shear_flow |
0.1567 | 0.1348 | 0.5910 | 0.2037 |
supernova_explosion_64 |
0.3783 | 0.3785 | 0.3063 | 0.3181 |
turbulence_gravity_cooling |
0.2429 | 0.2673 | 0.6753 | 0.2096 |
turbulent_radiative_layer_2D |
0.5001 | 0.5016 | 0.2418 | 0.1956 |
turbulent_radiative_layer_3D |
0.5278 | 0.5187 | 0.3728 | 0.3667 |
viscoelastic_instability |
0.7212 | 0.7102 | 0.4185 | 0.2499 |
Table 1: Model Performance Comparison - VRMSE metrics on test sets (lower is better) for models performing best on the validation set (results below). Best results are shown in bold. VRMSE is scaled such that predicting the mean value of the target field results in a score of 1. Test set results for models performing best on the validation set.
Validation results¶
Dataset | FNO | TFNO | U-net | CNextU-net |
---|---|---|---|---|
acoustic_scattering_maze |
0.5033 | 0.5034 | 0.0395 | 0.0196 |
active_matter |
0.3157 | 0.3342 | 0.2609 | 0.0953 |
convective_envelope_rsg |
0.0224 | 0.0195 | 0.0701 | 0.0663 |
euler_multi_quadrants_periodicBC |
0.3993 | 0.4110 | 0.2046 | 0.1228 |
gray_scott_reaction_diffusion |
0.2044 | 0.1784 | 0.5870 | 0.3596 |
helmholtz_staircase |
0.00160 | 0.00031 | 0.01655 | 0.00146 |
MHD_64 |
0.3352 | 0.3347 | 0.1988 | 0.1487 |
planetswe |
0.0855 | 0.1061 | 0.3498 | 0.3268 |
post_neutron_star_merger |
0.4144 | 0.4064 | - | - |
rayleigh_benard |
0.6049 | 0.8568 | 0.8448 | 0.4807 |
rayleigh_taylor_instability (At = 0.25) |
0.4013 | 0.2251 | 0.6140 | 0.3771 |
shear_flow |
0.2963 | 0.2087 | 0.5799 | 0.3258 |
supernova_explosion_64 |
0.3804 | 0.3645 | 0.3242 | 0.2801 |
turbulence_gravity_cooling |
0.2381 | 0.2789 | 0.3152 | 0.2093 |
turbulent_radiative_layer_2D |
0.4906 | 0.4938 | 0.2394 | 0.1247 |
turbulent_radiative_layer_3D |
0.5199 | 0.5174 | 0.3635 | 0.3562 |
viscoelastic_instability |
0.7195 | 0.7021 | 0.3147 | 0.1966 |
Table 2: Dataset and model comparison in VRMSE metric on the validation sets, best result in bold. VRMSE is scaled such that predicting the mean value of the target field results in a score of 1.
Rollout loss (6:12)¶
Dataset | FNO \(\phantom{T}\) (6:12) | TFNO (6:12) | U-net (6:12) | CNextU-net (6:12) |
---|---|---|---|---|
acoustic_scattering_maze |
1.06 | 1.13 | 0.56 | 0.78 |
active_matter |
\(>\)10 | 7.52 | 2.53 | 2.11 |
convective_envelope_rsg |
0.28 | 0.32 | 0.76 | 1.15 |
euler_multi_quadrants_periodicBC |
1.13 | 1.23 | 1.02 | 4.98 |
gray_scott_reaction_diffusion |
0.89 | 1.54 | 0.57 | 0.29 |
helmholtz_staircase |
0.002 | 0.011 | 0.057 | 0.110 |
MHD_64 |
1.24 | 1.25 | 1.65 | 1.30 |
planetswe |
0.81 | 0.29 | 1.18 | 0.42 |
post_neutron_star_merger |
0.76 | 0.70 | --- | --- |
rayleigh_benard |
\(>\)10 | \(>\)10 | \(>\)10 | \(>\)10 |
rayleigh_taylor_instability |
\(>\)10 | 6.72 | \(>\)10 | \(>\)10 |
shear_flow |
1.62 | 1.63 | 1.22 | 0.32 |
supernova_explosion_64 |
2.41 | 1.86 | 0.94 | 1.12 |
turbulence_gravity_cooling |
3.55 | 4.49 | 7.14 | 1.30 |
turbulent_radiative_layer_2D |
1.79 | 6.01 | 0.66 | 0.54 |
turbulent_radiative_layer_3D |
0.81 | \(>\)10 | 0.95 | 0.77 |
viscoelastic_instability |
4.11 | 0.93 | 0.89 | 0.52 |
Rollout loss (13:30)¶
Dataset | FNO (13:30) | TFNO (13:30) | U-net (13:30) | CNextU-net (13:30) |
---|---|---|---|---|
acoustic_scattering_maze |
1.72 | 1.23 | 0.92 | 1.13 |
active_matter |
\(>\)10 | 4.72 | 2.62 | 2.71 |
convective_envelope_rsg |
0.47 | 0.65 | 2.16 | 1.59 |
euler_multi_quadrants_periodicBC |
1.37 | 1.52 | 1.63 | \(>\)10 |
gray_scott_reaction_diffusion |
\(>\)10 | \(>\)10 | \(>\)10 | 7.62 |
helmholtz_staircase |
0.003 | 0.019 | 0.097 | 0.194 |
MHD_64 |
1.61 | 1.81 | 4.66 | 2.23 |
planetswe |
2.96 | 0.55 | 1.92 | 0.52 |
post_neutron_star_merger |
1.05 | 1.05 | --- | --- |
rayleigh_benard |
\(>\)10 | \(>\)10 | \(>\)10 | \(>\)10 |
rayleigh_taylor_instability |
\(>\)10 | \(>\)10 | 2.84 | 7.43 |
shear_flow |
\(>\)10 | \(>\)10 | \(>\)10 | 1.91 |
supernova_explosion_64 |
\(>\)10 | \(>\)10 | 1.69 | 4.55 |
turbulence_gravity_cooling |
5.63 | 6.95 | 4.15 | 2.09 |
turbulent_radiative_layer_2D |
3.54 | \(>\)10 | 1.04 | 1.01 |
turbulent_radiative_layer_3D |
0.94 | \(>\)10 | 1.09 | 0.86 |
viscoelastic_instability |
--- | --- | --- | --- |
Table: Time-Averaged Losses by Window - VRMSE metrics on test sets (lower is better), averaged over time windows (6:12) and (13:30). Best results are shown in bold for (6:12) and underlined for (13:30). VRMSE is scaled such that predicting the mean value of the target field results in a score of 1.