NVIDIA leverages data science and machine learning to optimize chip manufacturing and operations workflows��from wafer fabrication and circuit probing to packaged chip testing. These stages generate terabytes of data, and turning that data into actionable insights at speed and scale is critical to ensuring quality, throughput, and cost efficiency. Over the years, we��ve developed robust ML pipelines��
]]>Tree-ensemble models remain a go-to for tabular data because they��re accurate, comparatively inexpensive to train, and fast. But deploying Python inference on CPUs quickly becomes the bottleneck once you need sub-10 ms of latency or millions of predictions per second. Forest Inference Library (FIL) first appeared in cuML 0.9 in 2019, and has always been about one thing: blazing-fast��
]]>Time series forecasting is a powerful data science technique used to predict future values based on data points from the past Open source Python libraries like skforecast make it easy to run time series forecasts on your data. They allow you to ��bring your own�� regressor that is compatible with the scikit-learn API, giving you the flexibility to work seamlessly with the model of your choice.
]]>The success of deep neural networks in multiple areas has prompted a great deal of thought and effort on how to deploy these models for use in real-world applications efficiently. However, efforts to accelerate the deployment of tree-based models (including random forest and gradient-boosted models) have received less attention, despite their continued dominance in tabular data analysis and their��
]]>This post was originally published on the RAPIDS AI blog. The RAPIDS Forest Inference Library, affectionately known as FIL, dramatically accelerates inference (prediction) for tree-based models, including gradient-boosted decision tree models (like those from XGBoost and LightGBM) and random forests. (For a deeper dive into the library overall, check out the original FIL blog.
]]>This post was originally published on the RAPIDS AI blog. Random forests are a popular machine learning technique for classification and regression problems. By building multiple independent decision trees, they reduce the problems of overfitting seen with individual trees. In this post, I review the basic random forest algorithms, show how their training can be parallelized on NVIDIA��
]]>This blog dives into a theoretical machine learning concept called the bias variance decomposition. This decomposition is a method which examines the expected generalization error for a given learning algorithm and a given data source. This helps us understand questions like: Generalization concerns overfitting, or the ability of a model learned on training data to provide effective��
]]>