Datasets#
Augmentation Policies#
Augmentations#
Data#
- class archai.supergraph.datasets.data.DataLoaders(train_dl: DataLoader | None = None, val_dl: DataLoader | None = None, test_dl: DataLoader | None = None)[source]#
- archai.supergraph.datasets.data.get_data(conf_loader: Config) DataLoaders [source]#
- archai.supergraph.datasets.data.create_dataset_provider(conf_dataset: Config) DatasetProvider [source]#
- archai.supergraph.datasets.data.get_dataloaders(ds_provider: DatasetProvider, load_train: bool, train_batch_size: int, load_test: bool, test_batch_size: int, aug, cutout: int, val_ratio: float, apex: ApexUtils, val_fold=0, img_size: int | None = None, train_workers: int | None = None, test_workers: int | None = None, target_lb=-1, max_batches: int = -1) Tuple[DataLoader | None, DataLoader | None, DataLoader | None] [source]#
Dataset Provider#
- class archai.supergraph.datasets.dataset_provider.DatasetProvider(conf_dataset: Config)[source]#
Distributed Stratified Sampler#
- class archai.supergraph.datasets.distributed_stratified_sampler.DistributedStratifiedSampler(dataset: Dataset, world_size: int | None = None, rank: int | None = None, shuffle: bool | None = True, val_ratio: float | None = 0.0, is_val_split: bool | None = False, max_samples: int | None = None)[source]#
Distributed stratified sampling of dataset.
This sampler works in distributed as well as non-distributed setting with no penalty in either mode and is a replacement for built-in torch.util.data.DistributedSampler.
In distributed setting, many instances of the same code runs as process known as replicas. Each replica has sequential number assigned by the launcher, starting from 0 to uniquely identify it. This is known as global rank or simply rank. The number of replicas is known as the world size. For non-distributed setting, world_size=1 and rank=0.
This sampler assumes that labels for each datapoint is available in dataset.targets property which should be array like containing as many values as length of the dataset. This is availalble already for many popular datasets such as cifar and, with newer PyTorch versions, ImageFolder as well as DatasetFolder. If you are using custom dataset, you can usually create this property with one line of code such as dataset.targets = [yi for _, yi in dataset].
To do distributed sampling, each replica must shuffle with same seed as all other replicas with every epoch and then chose some subset of dataset for itself. Traditionally, we use epoch number as seed for shuffling for each replica. However, this then requires that training code calls sampler.set_epoch(epoch) to set seed at every epoch.