Yogi Optimizer Link
Enter (You Only Gradient Once).
Yogi adds a tiny bit of compute per step and may need slightly more memory. In practice, it's negligible for most models. yogi optimizer
Developed by researchers at Google and Stanford, Yogi modifies Adam's adaptive learning rate mechanism to make it more robust to noisy gradients. Enter (You Only Gradient Once)