-
If we wish to incrementally add data or remove data without calculating the inverse, we may use the Woodbury formulas to give a rank
update. For more info. see Murphy 4.3.4.2 -
Pytorch: Models can be made to run faster and with less memory at the cost of some numerical instability by using
torch.autocast
(to make use of FP16)- Gradient Accumulation can be used as an approximation to having a higher batch size beyond the one that hardware allows for without OOM-ing.