Trainers#

Coin-Betting Optimizer#

class archai.trainers.coin_betting_optimizer.CocobBackprop(params: Iterable | Dict[str, Any], alpha: float | None = 100.0, eps: float | None = 1e-08)[source]#

Coin Betting optimizer with Backpropagation.

It has been proposed in Training Deep Networks without Learning Rates Through Coin Betting.

Reference:

https://arxiv.org/pdf/1705.07795.pdf

step(closure: Callable | None = None) FloatTensor[source]#

Performs a single optimization step (parameter update).

Parameters:

closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

class archai.trainers.coin_betting_optimizer.CocobOns(params: Iterable | Dict[str, Any], eps: float | None = 1e-08)[source]#

Coin Betting optimizer with Online Learning.

It has been proposed in Black-Box Reductions for Parameter-free Online Learning in Banach Spaces.

Reference:

https://arxiv.org/pdf/1705.07795.pdf

step(closure: Callable | None = None) FloatTensor[source]#

Performs a single optimization step (parameter update).

Parameters:

closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

Cyclic Cosine Scheduler#

class archai.trainers.cyclic_cosine_scheduler.CyclicCosineDecayLR(optimizer: Optimizer, init_decay_epochs: int, min_decay_lr: float | List[float], restart_interval: int | None = None, restart_interval_multiplier: float | None = None, restart_lr: float | List[float] | None = None, warmup_epochs: int | None = None, warmup_start_lr: float | List[float] | None = None, last_epoch: int | None = -1, verbose: bool | None = False)[source]#

A learning rate scheduler for cyclic cosine annealing.

This scheduler is useful when doing QAT provinding a ~0.3 ppl boost over the traditional cosine annealing scheduler.

For more details and code, see the project’s GitHub repository: abhuse/cyclic-cosine-decay

get_lr() float[source]#

Gradual Warmup Scheduler#

class archai.trainers.gradual_warmup_scheduler.GradualWarmupScheduler(optimizer: Optimizer, multiplier: float, total_epoch: int, after_scheduler: _LRScheduler | None = None)[source]#

Gradually warm-up (increasing) learning rate in optimizer.

It has been proposed in Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour.

get_lr() List[float][source]#
step(epoch: int | None = None, metrics: Dict[str, Any] | None = None) None[source]#

LAMB Optimizer#

class archai.trainers.lamb_optimizer.Lamb(params: Iterable, lr: float | None = 0.001, betas: Tuple[float, float] | None = (0.9, 0.999), eps: float | None = 1e-06, weight_decay: float | None = 0.0, adam: bool | None = False)[source]#

Lamb algorithm for large batch optimization.

It has been proposed in Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.

Reference:

https://arxiv.org/abs/1904.00962

step(closure: callable | None = None) FloatTensor[source]#

Performs a single optimization step (parameter update).

Parameters:

closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

class archai.trainers.lamb_optimizer.JITLamb(params: Iterable, lr: float | None = 0.001, betas: Tuple[float, float] | None = (0.9, 0.999), eps: float | None = 1e-06, weight_decay: float | None = 0.0, adam: bool | None = False)[source]#

JIT-based version of the Lamb algorithm for large batch optimization.

It has been proposed in Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.

Reference:

https://arxiv.org/abs/1904.00962

step(closure: callable | None = None) FloatTensor[source]#

Performs a single optimization step (parameter update).

Parameters:

closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

Losses#

class archai.trainers.losses.SmoothCrossEntropyLoss(weight: Tensor | None = None, reduction: str | None = 'mean', smoothing: float | None = 0.0)[source]#

Cross entropy loss with label smoothing support.

reduction: str#
forward(inputs: Tensor, targets: Tensor) Tensor[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.