Karthik Mandakolathur – NVIDIA Technical Blog

Karthik Mandakolathur – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-08-16T14:36:30Z http://www.open-lab.net/blog/feed/ Karthik Mandakolathur <![CDATA[Doubling all2all Performance with NVIDIA Collective Communication Library 2.12]]> http://www.open-lab.net/blog/?p=44338 2024-08-16T14:36:30Z 2022-02-28T17:00:00Z

Collective communications are a performance-critical ingredient of modern distributed AI training workloads such as recommender systems and natural language...]]>

Collective communications are a performance-critical ingredient of modern distributed AI training workloads such as recommender systems and natural language processing. NVIDIA Collective Communication Library (NCCL), a Magnum IO Library, implements GPU-accelerated collective operations: NCCL is topology-aware and is optimized to achieve high bandwidth and low latency over PCIe…

]]> 0 Karthik Mandakolathur <![CDATA[Boosting NVIDIA MLPerf Training v1.1 Performance with Full Stack Optimization]]> http://www.open-lab.net/blog/?p=41919 2023-07-05T19:29:06Z 2021-12-01T21:33:20Z

Five months have passed since v1.0, so it is time for another round of the MLPerf training benchmark. In this v1.1 edition, optimization over the entire...]]>

Five months have passed since v1.0, so it is time for another round of the MLPerf training benchmark. In this v1.1 edition, optimization over the entire hardware and software stack sees continuing improvement across the benchmarking suite for the submissions based on NVIDIA platform. This improvement is observed consistently at all different scales, from single machines all the way to industrial…

]]> 2 ��˳��97caoporen��