- 
If we wish to incrementally add data or remove data without calculating the inverse, we may use the Woodbury formulas to give a rank update. For more info. see Murphy 4.3.4.2 
- 
Pytorch: Models can be made to run faster and with less memory at the cost of some numerical instability by using torch.autocast(to make use of FP16)- Gradient Accumulation can be used as an approximation to having a higher batch size beyond the one that hardware allows for without OOM-ing.
 
- 
Pytorch autograd does not work if you don’t use the pytorch ops