view article Article Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel smangrul, sgugger • May 2, 2022 • 9