Skip to content

Data Generation¤

Tip

For multi-gpu training, make sure that the number of shards is divisible by the number of GPUs. 8 is usually a safe number.

Standard¤

export seed=42;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke mode=train samples=256 seed=$seed pdeconfig.sample_rate=4 \
    dirname=/mnt/data/navierstokes;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke mode=valid samples=32 seed=$seed pdeconfig.sample_rate=4 \
    dirname=/mnt/data/navierstokes;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke mode=test samples=32 seed=$seed pdeconfig.sample_rate=4 \
    dirname=/mnt/data/navierstokes;

Conditioned¤

export seed=42;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke_cond mode=train samples=256 seed=$seed \
    dirname=/mnt/data/navierstokes;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke_cond mode=valid samples=32 seed=$seed \
    dirname=/mnt/data/navierstokes;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke_cond mode=test samples=32 seed=$seed \
    dirname=/mnt/data/navierstokes;

Data normalization¤

The data was reasonably bounded that we didn't need any normalization.

Shallow water 2D¤

export seed=42;

python scripts/generate_data.py base=pdedatagen/configs/shallowwater.yaml \
    experiment=shallowwater mode=train samples=256 seed=$seed \
    dirname=/mnt/data/shallowwater;

python scripts/generate_data.py base=pdedatagen/configs/shallowwater.yaml \
    experiment=shallowwater mode=valid samples=32 seed=$seed \
    dirname=/mnt/data/shallowwater;

python scripts/generate_data.py base=pdedatagen/configs/shallowwater.yaml \
    experiment=shallowwater mode=test samples=32 seed=$seed \
    dirname=/mnt/data/shallowwater;

Convert to zarr¤

We found that data loading was a lot more performant with zarr format rather than original NetCDF format, especially with cloud storage. You can convert after data generation via:

for mode in train valid test; do
    python scripts/convertnc2zarr.py "/mnt/data/shallowwater/$mode";
done

Data normalization¤

python scripts/compute_normalization.py \
    --dataset shallowwater /mnt/data/shallowwater

Maxwell 3D¤

export seed=42

python scripts/generate_data.py base=pdedatagen/configs/maxwell3d.yaml \
    experiment=maxwell mode=train samples=256 seed=$seed dirname=/mnt/data/maxwell3d;

python scripts/generate_data.py base=pdedatagen/configs/maxwell3d.yaml \
    experiment=maxwell mode=valid samples=32 seed=$seed dirname=/mnt/data/maxwell3d;

python scripts/generate_data.py base=pdedatagen/configs/maxwell3d.yaml \
    experiment=maxwell mode=test samples=32 seed=$seed dirname=/mnt/data/maxwell3d;

Data normalization¤

python scripts/compute_normalization.py \
    --dataset maxwell /mnt/data/maxwell3d

PDEBench¤

Generating¤

Follow PDEBench's instructions.

Resharding for multi-gpu experiments¤

Coming soon...

Data normalization¤

Coming soon...

Your PDE¤

Please submit a pull request to add a data loading pipeline for your PDE dataset.