BoYang Hsueh – NVIDIA Technical Blog

BoYang Hsueh – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-06-13T19:06:00Z http://www.open-lab.net/blog/feed/ BoYang Hsueh <![CDATA[Seamlessly Deploying a Swarm of LoRA Adapters with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=83606 2024-06-13T19:06:00Z 2024-06-07T16:00:00Z

The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They...]]>

The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They often achieve striking results on a wide variety of use cases without any need for customization. Despite this, studies have shown that the best accuracy on downstream tasks can be achieved by adapting LLMs with high-quality…

]]> BoYang Hsueh <![CDATA[Full-Stack Innovation Fuels Highest MLPerf Inference 2.1 Results for NVIDIA]]> http://www.open-lab.net/blog/?p=54638 2023-07-05T19:26:31Z 2022-09-08T18:10:00Z

Today��s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in...]]>

Today’s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in a pipeline. To meet the increasing demands of AI-infused applications, an AI platform must not only deliver high performance but also be versatile enough to deliver that performance across a diverse range of AI models.

]]> 0 BoYang Hsueh <![CDATA[Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server]]> http://www.open-lab.net/blog/?p=51300 2023-05-24T00:22:56Z 2022-08-03T17:00:00Z

This is the first part of a two-part series discussing the NVIDIA Triton Inference Server��s FasterTransformer (FT) library, one of the fastest libraries for...]]>

This is the first part of a two-part series discussing the NVIDIA Triton Inference Server’s FasterTransformer (FT) library, one of the fastest libraries for distributed inference of transformers of any size (up to trillions of parameters). It provides an overview of FasterTransformer, including the benefits of using the library. Join the NVIDIA Triton and NVIDIA TensorRT community to stay…

]]> 1 BoYang Hsueh <![CDATA[Deploying GPT-J and T5 with NVIDIA Triton Inference Server]]> http://www.open-lab.net/blog/?p=51318 2023-03-14T23:22:55Z 2022-08-03T17:00:00Z

This is the second part of a two-part series about NVIDIA tools that allow you to run large transformer models for accelerated inference. For an introduction to...]]>

This is the second part of a two-part series about NVIDIA tools that allow you to run large transformer models for accelerated inference. For an introduction to the FasterTransformer library (Part 1), see Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates…

]]> 7 ��˳��97caoporen��