Deep learning recommender systems often use large embedding tables. It can be difficult to fit them in GPU memory. This post shows you how to use a combination of model parallel and data parallel training paradigms to solve this memory issue to train large deep learning recommender systems more quickly. I share the steps that my team took to efficiently train a 113 billion-parameter��
]]>