GTC Silicon Valley-2019 ID:S9321:Distributed Deep Learning with Horovod
Alex Sergeev(Uber Technologies, Inc.)
Learn how to scale distributed training of TensorFlow and PyTorch models with Horovod, a library designed to make distributed training fast and easy to use. Although frameworks like TensorFlow and PyTorch simplify the design and training of deep learning models, difficulties usually arise when scaling models to multiple GPUs in a server or multiple servers in a cluster. We'll explain the role of Horovod in taking a model designed on a single GPU and training it on a cluster of GPU servers.