Form of a Batch#
You must feed data to the model in the form of a aurora.Batch.
We now explain the exact form of aurora.Batch.
Overall Structure#
Batches contain four things:
some surface-level variables,
some static variables,
some atmospheric variables all at the same collection of pressure levels, and
metadata describing these variables: latitudes, longitudes, the pressure levels of the atmospheric variables, and the time of the data.
All variables in a batch are unnormalised. Normalisation happens internally in the model.
Before we explain the four components in detail, here is an example with randomly generated data:
from datetime import datetime
import torch
from aurora import Batch, Metadata
batch = Batch(
surf_vars={k: torch.randn(1, 2, 17, 32) for k in ("2t", "10u", "10v", "msl")},
static_vars={k: torch.randn(17, 32) for k in ("lsm", "z", "slt")},
atmos_vars={k: torch.randn(1, 2, 4, 17, 32) for k in ("z", "u", "v", "t", "q")},
metadata=Metadata(
lat=torch.linspace(90, -90, 17),
lon=torch.linspace(0, 360, 32 + 1)[:-1],
time=(datetime(2020, 6, 1, 12, 0),),
atmos_levels=(100, 250, 500, 850),
),
)
Batch.surf_vars#
Batch.surf_vars is a dictionary mapping names of surface-level variables to the numerical values
of the variables.
The surface-level variables must be of the form (b, t, h, w) where b is the batch size,
t the history dimension, h the number of latitudes, and w the number of longitudes.
All Aurora models produce the prediction for the next step from the current and previous step.
surf_vars[:, 1, :, :] must correspond to the current step,
and surf_vars[:, 0, :, :] must correspond to the previous step, so the step before that.
The following surface-level variables are allowed:
Name |
Description |
|---|---|
|
Two-meter temperature in |
|
Ten-meter eastward wind speed in |
|
Ten-meter southward wind speed in |
|
Mean sea-level pressure in |
For Aurora 0.4° Air Pollution, the following surface-level variables are also allowed:
Name |
Description |
|---|---|
|
Particulate matter less than |
|
Particulate matter less than |
|
Particulate matter less than |
|
Total column carbon monoxide in |
|
Total column nitrogen monoxide in |
|
Total column nitrogen dioxide in |
|
Total column sulphur dioxide in |
|
Total column ozone in |
For Aurora 0.25° Wave, the following surface-level variables are also allowed:
Name |
Description |
|---|---|
|
Significant wave height of the total wave in |
|
Mean wave direction of the total wave in |
|
Mean wave period of the total wave in |
|
Peak wave period of the total wave in |
|
Significant wave height of the wind wave component in |
|
Mean wave direction of the wind wave component in |
|
Mean wave period of the wind wave component in |
|
Significant wave height of the total swell component in |
|
Mean wave direction of the total swell component in |
|
Mean wave period of the total swell component in |
|
Significant wave height of the first swell component in |
|
Mean wave direction of the first swell component in |
|
Mean wave period of the first swell component in |
|
Significant wave height of the second swell component in |
|
Mean wave direction of the second swell component in |
|
Mean wave period of the second swell component in |
|
Ten-meter neutral wind speed in |
|
Ten-meter eastward neutral wind speed in |
|
Ten-meter southward neutral wind speed in |
Batch.static_vars#
Batch.static_vars is a dictionary mapping names of static variables to the
numerical values of the variables.
The static variables must be of the form (h, w) where h is the number of latitudes
and w the number of longitudes.
The following static variables are allowed:
Name |
Description |
|---|---|
|
|
|
|
|
Surface-level geopotential in |
Aurora 0.4° Air Pollution and Aurora 0.25° Wave require additional static variables, but these are not easy to obtain yourself. You need to obtain these from the HuggingFace repository. See the description of the models.
Batch.atmos_vars#
Batch.atmos_vars is a dictionary mapping names of atmospheric variables to the
numerical values of the variables.
The atmospheric variables must be of the form (b, t, c, h, w) where b is the batch size,
t the history dimension, c the number of pressure levels, h the number of latitudes,
and w the number of longitudes.
All atmospheric variables must contain the same collection of pressure levels in the same order.
The following atmospheric variables are allowed:
Name |
Description |
|---|---|
|
Temperature in |
|
Eastward wind speed in |
|
Southward wind speed in |
|
Specific humidity in |
|
Geopotential in |
For Aurora 0.4° Air Pollution, the following atmospheric variables are also allowed:
Name |
Description |
|---|---|
|
Carbon monoxide in |
|
Nitrogen monoxide in |
|
Nitrogen dioxide in |
|
Sulphur dioxide in |
|
Ozone in |
Batch.metadata#
Batch.metadata must be a Metadata, which contains the following fields:
Metadata.latis the vector of latitudes. The latitudes must be decreasing. The latitudes can either include both endpoints, likelinspace(90, -90, 721), or not include the south pole, likelinspace(90, -90, 721)[:-1]. For curvilinear grids, this can also be a matrix, in which case the foregoing conditions apply to every column.Metadata.lonis the vector of longitudes. The longitudes must be increasing. The longitudes must be in the range[0, 360), so they can include zero and cannot include 360. For curvilinear grids, this can also be a matrix, in which case the foregoing conditions apply to every row.Metadata.atmos_levelsis atupleof the pressure levels of the atmospheric variables in hPa. Note that these levels must be in exactly correspond to the order of the atmospheric variables. Note also thatMetadata.atmos_levelsshould be atuple, not alist.Metadata.timeis atuplewith, for each batch element, adatetime.datetimerepresenting the time of the data. If the batch size is one, then this will be a one-elementtuple, e.g.(datetime(2024, 1, 1, 12, 0),). Since all Aurora models require variables for the current and previous step,Metadata.timecorresponds to the time of the current step. Specifically,Metadata.time[i]corresponds to the time ofBatch.surf_vars[i, -1].
Model Output#
The output of aurora.forward(batch) will again be a Batch.
This batch is of exactly the same form, with only one difference:
the history dimension will have size one.