Form of a Batch#
You must feed data to the model in the form of a aurora.Batch
.
We now explain the exact form of aurora.Batch
.
Overall Structure#
Batches contain four things:
some surface-level variables,
some static variables,
some atmospheric variables all at the same collection of pressure levels, and
metadata describing these variables: latitudes, longitudes, the pressure levels of the atmospheric variables, and the time of the data.
All variables in a batch are unnormalised. Normalisation happens internally in the model.
Before we explain the four components in detail, here is an example with randomly generated data:
from datetime import datetime
import torch
from aurora import Batch, Metadata
batch = Batch(
surf_vars={k: torch.randn(1, 2, 17, 32) for k in ("2t", "10u", "10v", "msl")},
static_vars={k: torch.randn(17, 32) for k in ("lsm", "z", "slt")},
atmos_vars={k: torch.randn(1, 2, 4, 17, 32) for k in ("z", "u", "v", "t", "q")},
metadata=Metadata(
lat=torch.linspace(90, -90, 17),
lon=torch.linspace(0, 360, 32 + 1)[:-1],
time=(datetime(2020, 6, 1, 12, 0),),
atmos_levels=(100, 250, 500, 850),
),
)
Batch.surf_vars
#
Batch.surf_vars
is a dictionary mapping names of surface-level variables to the numerical values
of the variables.
The surface-level variables must be of the form (b, t, h, w)
where b
is the batch size,
t
the history dimension, h
the number of latitudes, and w
the number of longitudes.
All Aurora models produce the prediction for the next step from the current and previous step.
surf_vars[:, 1, :, :]
must correspond to the current step,
and surf_vars[:, 0, :, :]
must correspond to the previous step, so the step before that.
The following surface-level variables are allowed:
Name |
Description |
---|---|
|
Two-meter temperature in |
|
Ten-meter eastward wind speed in |
|
Ten-meter southward wind speed in |
|
Mean sea-level pressure in |
For Aurora 0.4° Air Pollution, the following surface-level variables are also allowed:
Name |
Description |
---|---|
|
Particulate matter less than |
|
Particulate matter less than |
|
Particulate matter less than |
|
Total column carbon monoxide in |
|
Total column nitrogen monoxide in |
|
Total column nitrogen dioxide in |
|
Total column sulphur dioxide in |
|
Total column ozone in |
For Aurora 0.25° Wave, the following surface-level variables are also allowed:
Name |
Description |
---|---|
|
Significant wave height of the total wave in |
|
Mean wave direction of the total wave in |
|
Mean wave period of the total wave in |
|
Peak wave period of the total wave in |
|
Significant wave height of the wind wave component in |
|
Mean wave direction of the wind wave component in |
|
Mean wave period of the wind wave component in |
|
Significant wave height of the total swell component in |
|
Mean wave direction of the total swell component in |
|
Mean wave period of the total swell component in |
|
Significant wave height of the first swell component in |
|
Mean wave direction of the first swell component in |
|
Mean wave period of the first swell component in |
|
Significant wave height of the second swell component in |
|
Mean wave direction of the second swell component in |
|
Mean wave period of the second swell component in |
|
Ten-meter neutral wind speed in |
|
Ten-meter eastward neutral wind speed in |
|
Ten-meter southward neutral wind speed in |
Batch.static_vars
#
Batch.static_vars
is a dictionary mapping names of static variables to the
numerical values of the variables.
The static variables must be of the form (h, w)
where h
is the number of latitudes
and w
the number of longitudes.
The following static variables are allowed:
Name |
Description |
---|---|
|
|
|
|
|
Surface-level geopotential in |
Aurora 0.4° Air Pollution and Aurora 0.25° Wave require additional static variables, but these are not easy to obtain yourself. You need to obtain these from the HuggingFace repository. See the description of the models.
Batch.atmos_vars
#
Batch.atmos_vars
is a dictionary mapping names of atmospheric variables to the
numerical values of the variables.
The atmospheric variables must be of the form (b, t, c, h, w)
where b
is the batch size,
t
the history dimension, c
the number of pressure levels, h
the number of latitudes,
and w
the number of longitudes.
All atmospheric variables must contain the same collection of pressure levels in the same order.
The following atmospheric variables are allowed:
Name |
Description |
---|---|
|
Temperature in |
|
Eastward wind speed in |
|
Southward wind speed in |
|
Specific humidity in |
|
Geopotential in |
For Aurora 0.4° Air Pollution, the following atmospheric variables are also allowed:
Name |
Description |
---|---|
|
Carbon monoxide in |
|
Nitrogen monoxide in |
|
Nitrogen dioxide in |
|
Sulphur dioxide in |
|
Ozone in |
Batch.metadata
#
Batch.metadata
must be a Metadata
, which contains the following fields:
Metadata.lat
is the vector of latitudes. The latitudes must be decreasing. The latitudes can either include both endpoints, likelinspace(90, -90, 721)
, or not include the south pole, likelinspace(90, -90, 721)[:-1]
. For curvilinear grids, this can also be a matrix, in which case the foregoing conditions apply to every column.Metadata.lon
is the vector of longitudes. The longitudes must be increasing. The longitudes must be in the range[0, 360)
, so they can include zero and cannot include 360. For curvilinear grids, this can also be a matrix, in which case the foregoing conditions apply to every row.Metadata.atmos_levels
is atuple
of the pressure levels of the atmospheric variables in hPa. Note that these levels must be in exactly correspond to the order of the atmospheric variables. Note also thatMetadata.atmos_levels
should be atuple
, not alist
.Metadata.time
is atuple
with, for each batch element, adatetime.datetime
representing the time of the data. If the batch size is one, then this will be a one-elementtuple
, e.g.(datetime(2024, 1, 1, 12, 0),)
. Since all Aurora models require variables for the current and previous step,Metadata.time
corresponds to the time of the current step. Specifically,Metadata.time[i]
corresponds to the time ofBatch.surf_vars[i, -1]
.
Model Output#
The output of aurora.forward(batch)
will again be a Batch
.
This batch is of exactly the same form, with only one difference:
the history dimension will have size one.