This led to an increased effort within the deep learning community to develop larger models. To keep up with the growing network size, the scaling of training compute has also accelerated ...