Jiwei Liu – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-11-20T23:02:36Z http://www.open-lab.net/blog/feed/ Jiwei Liu <![CDATA[RAG 101: Retrieval-Augmented Generation Questions Answered]]> http://www.open-lab.net/blog/?p=75743 2024-11-20T23:02:36Z 2023-12-18T19:44:42Z Data scientists, AI engineers, MLOps engineers, and IT infrastructure professionals must consider a variety of factors when designing and deploying a RAG...]]>

Data scientists, AI engineers, MLOps engineers, and IT infrastructure professionals must consider a variety of factors when designing and deploying a RAG pipeline: from core components like LLM to evaluation approaches. The key point is that RAG is a system, not just a model or set of models. This system consists of several stages, which were discussed at a high level in RAG 101…

Source

]]>
2
Jiwei Liu <![CDATA[Deploying Retrieval-Augmented Generation Applications on NVIDIA GH200 Delivers Accelerated Performance]]> http://www.open-lab.net/blog/?p=74632 2024-09-22T15:11:34Z 2023-12-18T17:00:00Z Large language model (LLM) applications are essential in enhancing productivity across industries through natural language. However, their effectiveness is...]]>

Large language model (LLM) applications are essential in enhancing productivity across industries through natural language. However, their effectiveness is often limited by the extent of their training data, resulting in poor performance when dealing with real-time events and new knowledge the LLM isn’t trained on. Retrieval-augmented generation (RAG) solves these problems.

Source

]]>
3
Jiwei Liu <![CDATA[Unlocking Multi-GPU Model Training with Dask XGBoost]]> http://www.open-lab.net/blog/?p=70261 2023-09-07T18:30:59Z 2023-09-07T17:09:18Z As data scientists, we often face the challenging task of training large models on huge datasets. One commonly used tool, XGBoost, is a robust and efficient...]]>

As data scientists, we often face the challenging task of training large models on huge datasets. One commonly used tool, XGBoost, is a robust and efficient gradient-boosting framework that’s been widely adopted due to its speed and performance for large tabular data. Using multiple GPUs should theoretically provide a significant boost in computational power, resulting in faster model…

Source

]]>
1
Jiwei Liu <![CDATA[Predicting Credit Defaults Using Time-Series Models with Recursive Neural Networks and XGBoost]]> http://www.open-lab.net/blog/?p=66009 2023-06-14T19:35:53Z 2023-06-07T20:54:46Z Today��s machine learning (ML) solutions are complex and rarely use just a single model. Training models effectively requires large, diverse datasets that may...]]>

Today’s machine learning (ML) solutions are complex and rarely use just a single model. Training models effectively requires large, diverse datasets that may require multiple models to predict effectively. Also, deploying complex multi-model ML solutions in production can be a challenging task. A common example is when compatibility issues with different frameworks can lead to delayed insights.

Source

]]>
0
Jiwei Liu <![CDATA[Achieving 100x Faster Single-Cell Modality Prediction with NVIDIA RAPIDS cuML]]> http://www.open-lab.net/blog/?p=56208 2023-06-12T08:45:06Z 2022-10-19T16:00:00Z Single-cell measurement technologies have advanced rapidly, revolutionizing the life sciences. We have scaled from measuring dozens to millions of cells and...]]>

Single-cell measurement technologies have advanced rapidly, revolutionizing the life sciences. We have scaled from measuring dozens to millions of cells and from one modality to multiple high dimensional modalities. The vast amounts of information at the level of individual cells present a great opportunity to train machine learning models to help us better understand the intrinsic link of cell…

Source

]]>
0
Jiwei Liu <![CDATA[Fast Fine-Tuning of AI Transformers Using RAPIDS Machine Learning]]> http://www.open-lab.net/blog/?p=46373 2024-04-24T23:13:02Z 2022-04-14T03:05:21Z In recent years, transformers have emerged as a powerful deep neural network architecture that has been proven to beat the state of the art in many application...]]>

In recent years, transformers have emerged as a powerful deep neural network architecture that has been proven to beat the state of the art in many application domains, such as natural language processing (NLP) and computer vision. This post uncovers how you can achieve maximum accuracy with the fastest training time possible when fine-tuning transformers. We demonstrate how the cuML support…

Source

]]>
0
Jiwei Liu <![CDATA[Competition and Community Insights from NVIDIA��s Kaggle Grandmasters]]> http://www.open-lab.net/blog/?p=37814 2022-08-21T23:52:43Z 2021-09-23T17:00:00Z In this post, we summarize questions and answers from GTC sessions with NVIDIA��s Kaggle Grandmaster team.  Additionally, we answer audience questions we...]]>

In this post, we summarize questions and answers from GTC sessions with NVIDIA’s Kaggle Grandmaster team. Additionally, we answer audience questions we did not get a chance during these sessions. Ahmet: I read the competition description and evaluation metric. Then I give myself several days to think about if I have any novel ideas that I can try on. If I do not have any interesting…

Source

]]>
0
Jiwei Liu <![CDATA[Gauss Rank Transformation Is 100x Faster with RAPIDS and CuPy]]> http://www.open-lab.net/blog/?p=32741 2022-08-21T23:51:54Z 2021-06-11T15:00:00Z As explained in the Batch Normalization paper, training neural networks becomes way easier if its input is Gaussian. This is clear. And if your model inputs are...]]>

As explained in the Batch Normalization paper, training neural networks becomes way easier if its input is Gaussian. This is clear. And if your model inputs are not Gaussian, RAPIDS will just transform it to Gaussian in the blink of an eye. Gauss rank transformation is a novel standardization technique to transform input data for training deep neural networks. Recently, we used this technique…

Source

]]>
0
Jiwei Liu <![CDATA[How to Build a Winning Deep Learning Powered Recommender System-Part 3]]> http://www.open-lab.net/blog/?p=31268 2024-10-28T19:15:46Z 2021-05-06T18:00:00Z Recommender systems (RecSys) have become a key component in many online services, such as e-commerce, social media, news service, or online video streaming....]]>

Recommender systems (RecSys) have become a key component in many online services, such as e-commerce, social media, news service, or online video streaming. However with the growth in importance, the growth in scale of industry datasets, and more sophisticated models, the bar has been raised for computational resources required for recommendation systems. To meet the computational demands…

Source

]]>
0
Jiwei Liu <![CDATA[Make Sense of the Universe with Rapids.ai]]> http://www.open-lab.net/blog/?p=13637 2022-08-21T23:39:20Z 2019-02-25T14:00:38Z [caption id="attachment_13634" align="alignright" width="549"] Image from Large Synoptic Survey Telescope (LSST)[/caption] Classification of astronomical...]]>

Classification of astronomical sources in the night sky is important for understanding the universe. It helps us understand the properties of what makes up celestial systems, from our solar system to the most distant galaxy and everything in between. The Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC) wants to revolutionize the field by automatically classifying 10…

Source

]]>
2
���˳���97caoporen����