Benchmarks¶

To showcase the dataset and the associated benchmarking library, we provide a set of simple baselines time-boxed to 12 hours on a single NVIDIA H100 to demonstrate the effectiveness of naive approaches on these challenging problems and motivate the development of more sophisticated approaches. These baselines are trained on the forward problem - predicting the next snapshot of a given simulation from a short history of 4 time-steps. The models used here are the Fourier Neural Operator, Tucker-Factorized FNO, U-net and a modernized U-net using ConvNext blocks. The neural operator models are implemented using the neuraloperator library.

We emphasize that these settings are not selected to explore peak performance of modern machine learning, but rather that they reflect reasonable compute budgets and off-the-shelf choices that might be selected by a domain scientist exploring machine learning for their problems. Therefore, we focus on popular models using settings that are either defaults or commonly tuned.

Benchmarked Model Checkpoints¶

Most of the checkpoints of the models are available on Hugging Face. To load a specific checkpoint follow the example below of the FNO model trained on the active_matter dataset.

from the_well.benchmark.models import FNO

model = FNO.from_pretrained("polymathic-ai/FNO-active_matter")

Test results¶

Dataset	FNO	TFNO	U-net	CNextU-net
`acoustic_scattering_maze`	0.5062	0.5057	0.0351	0.0153
`active_matter`	0.3691	0.3598	0.2489	0.1034
`convective_envelope_rsg`	0.0269	0.0283	0.0555	0.0799
`euler_multi_quadrants_periodicBC`	0.4081	0.4163	0.1834	0.1531
`gray_scott_reaction_diffusion`	0.1365	0.3633	0.2252	0.1761
`helmholtz_staircase`	0.00046	0.00346	0.01931	0.02758
`MHD_64`	0.3605	0.3561	0.1798	0.1633
`planetswe`	0.1727	0.0853	0.3620	0.3724
`post_neutron_star_merger`	0.3866	0.3793	-	-
`rayleigh_benard`	0.8395	0.6566	1.4860	0.6699
`rayleigh_taylor_instability` (At = 0.25)	>10	>10	>10	>10
`shear_flow`	1.189	1.472	3.447	0.8080
`supernova_explosion_64`	0.3783	0.3785	0.3063	0.3181
`turbulence_gravity_cooling`	0.2429	0.2673	0.6753	0.2096
`turbulent_radiative_layer_2D`	0.5001	0.5016	0.2418	0.1956
`turbulent_radiative_layer_3D`	0.5278	0.5187	0.3728	0.3667
`viscoelastic_instability`	0.7212	0.7102	0.4185	0.2499

Table 1: Model Performance Comparison - VRMSE metrics on test sets (lower is better) for models performing best on the validation set (results below). Best results are shown in bold. VRMSE is scaled such that predicting the mean value of the target field results in a score of 1. Test set results for models performing best on the validation set.

Validation results¶

Dataset	FNO	TFNO	U-net	CNextU-net
`acoustic_scattering_maze`	0.5033	0.5034	0.0395	0.0196
`active_matter`	0.3157	0.3342	0.2609	0.0953
`convective_envelope_rsg`	0.0224	0.0195	0.0701	0.0663
`euler_multi_quadrants_periodicBC`	0.3993	0.4110	0.2046	0.1228
`gray_scott_reaction_diffusion`	0.2044	0.1784	0.5870	0.3596
`helmholtz_staircase`	0.00160	0.00031	0.01655	0.00146
`MHD_64`	0.3352	0.3347	0.1988	0.1487
`planetswe`	0.0855	0.1061	0.3498	0.3268
`post_neutron_star_merger`	0.4144	0.4064	-	-
`rayleigh_benard`	0.6049	0.8568	0.8448	0.4807
`rayleigh_taylor_instability` (At = 0.25)	0.4013	0.2251	0.6140	0.3771
`shear_flow`	0.4450	0.3626	0.836	0.3972
`supernova_explosion_64`	0.3804	0.3645	0.3242	0.2801
`turbulence_gravity_cooling`	0.2381	0.2789	0.3152	0.2093
`turbulent_radiative_layer_2D`	0.4906	0.4938	0.2394	0.1247
`turbulent_radiative_layer_3D`	0.5199	0.5174	0.3635	0.3562
`viscoelastic_instability`	0.7195	0.7021	0.3147	0.1966

Table 2: Dataset and model comparison in VRMSE metric on the validation sets, best result in bold. VRMSE is scaled such that predicting the mean value of the target field results in a score of 1.

Rollout loss (6:12)¶

Dataset	FNO \(\phantom{T}\) (6:12)	TFNO (6:12)	U-net (6:12)	CNextU-net (6:12)
`acoustic_scattering_maze`	1.06	1.13	0.56	0.78
`active_matter`	\(>\)10	7.52	2.53	2.11
`convective_envelope_rsg`	0.28	0.32	0.76	1.15
`euler_multi_quadrants_periodicBC`	1.13	1.23	1.02	4.98
`gray_scott_reaction_diffusion`	0.89	1.54	0.57	0.29
`helmholtz_staircase`	0.002	0.011	0.057	0.110
`MHD_64`	1.24	1.25	1.65	1.30
`planetswe`	0.81	0.29	1.18	0.42
`post_neutron_star_merger`	0.76	0.70	---	---
`rayleigh_benard`	\(>\)10	\(>\)10	\(>\)10	\(>\)10
`rayleigh_taylor_instability`	\(>\)10	6.72	\(>\)10	\(>\)10
`shear_flow`	\(>\)10	\(>\)10	\(>\)10	2.33
`supernova_explosion_64`	2.41	1.86	0.94	1.12
`turbulence_gravity_cooling`	3.55	4.49	7.14	1.30
`turbulent_radiative_layer_2D`	1.79	6.01	0.66	0.54
`turbulent_radiative_layer_3D`	0.81	\(>\)10	0.95	0.77
`viscoelastic_instability`	4.11	0.93	0.89	0.52

Rollout loss (13:30)¶

Dataset	FNO (13:30)	TFNO (13:30)	U-net (13:30)	CNextU-net (13:30)
`acoustic_scattering_maze`	1.72	1.23	0.92	1.13
`active_matter`	\(>\)10	4.72	2.62	2.71
`convective_envelope_rsg`	0.47	0.65	2.16	1.59
`euler_multi_quadrants_periodicBC`	1.37	1.52	1.63	\(>\)10
`gray_scott_reaction_diffusion`	\(>\)10	\(>\)10	\(>\)10	7.62
`helmholtz_staircase`	0.003	0.019	0.097	0.194
`MHD_64`	1.61	1.81	4.66	2.23
`planetswe`	2.96	0.55	1.92	0.52
`post_neutron_star_merger`	1.05	1.05	---	---
`rayleigh_benard`	\(>\)10	\(>\)10	\(>\)10	\(>\)10
`rayleigh_taylor_instability`	\(>\)10	\(>\)10	2.84	7.43
`shear_flow`	\(>\)10	\(>\)10	\(>\)10	\(>\)10
`supernova_explosion_64`	\(>\)10	\(>\)10	1.69	4.55
`turbulence_gravity_cooling`	5.63	6.95	4.15	2.09
`turbulent_radiative_layer_2D`	3.54	\(>\)10	1.04	1.01
`turbulent_radiative_layer_3D`	0.94	\(>\)10	1.09	0.86
`viscoelastic_instability`	---	---	---	---

Table: Time-Averaged Losses by Window - VRMSE metrics on test sets (lower is better), averaged over time windows (6:12) and (13:30). Best results are shown in bold for (6:12) and underlined for (13:30). VRMSE is scaled such that predicting the mean value of the target field results in a score of 1.