Roman Dubtsov – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-22T23:29:46Z http://www.open-lab.net/blog/feed/ Roman Dubtsov <![CDATA[Boosting Matrix Multiplication Speed and Flexibility with NVIDIA cuBLAS 12.9]]> http://www.open-lab.net/blog/?p=99184 2025-07-01T16:36:10Z 2025-05-01T20:00:00Z The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more.   Two...]]>

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two of the most important applications of CUDA-X libraries are training and inference LLMs, whether for use in everyday consumer applications or highly specialized scientific domains like drug discovery. Multiple CUDA-X libraries are indispensable…

Source

]]>
Roman Dubtsov <![CDATA[New cuBLAS 12.0 Features and Matrix Multiplication Performance on NVIDIA Hopper GPUs]]> http://www.open-lab.net/blog/?p=60111 2025-07-22T23:29:46Z 2023-02-01T18:30:00Z The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper architecture with the fourth generation of NVIDIA Tensor Cores, recently debuted delivering...]]>

The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper architecture with the fourth generation of NVIDIA Tensor Cores, recently debuted delivering unprecedented performance and sweeping AI benchmarks such as MLPerf training. A significant fraction of operations in AI and machine learning benchmarks are general matrix multiplications (GEMMS), which are also referred to as matmul…

Source

]]>
0
���˳���97caoporen����