LIVE TRAINING: Gradient Boosting for Prediction and Inference
You can still view the live recording and all course materials by registering below.
Training duration: 4 hours
Sign-up for a Basic or Premium Plan and Get 10-35% Additional Discount Off Live Training
Brian Lucena,PhD
Principal | Numeristical
Explain how Gradient Boosting methods work as an ensemble of decision trees
Understand the various details and choices available for Gradient Boosting models
Know the various Gradient Boosting packages and the capabilities, strengths, and weaknesses of each
Know how to approach a data set, build a simple model, and then improve it
Know how to experiment with the parameters and use grid-search in a principled way
Know how to evaluate, interpret, calibrate, and explain your model to a lay audience
This is an intensive course focused on using Gradient Boosting for classification and regression problems. This is a hands-on workshop with real data sets, where the participants will gain valuable experience training, evaluating, and drawing conclusions from Gradient Boosting models.
At the end of the course, participants will feel confident that they understand all details and parameters behind Gradient Boosting, and be able to present, criticize, and defend the models they create.
Lesson 1: How Gradient Boosting works
We will review the concepts starting from the definition of the Decision Tree, through Random Forests to how Gradient Boosting can be seen as gradient descent where each step is a tree. Along the way, we will highlight the small details that make a big difference when it comes to configuring Gradient Boosting models.
Lesson 2: Review of Gradient Boosting Packages
We will present and use all the major Gradient Boosting implementations, including scikit-learn, XGBoost, CatBoost, LightGBM, and StructureBoost, demonstrating the relative strengths and weaknesses of each one.
Lesson 3: Training in Practice, Setting Parameters, Cross-Validation
We will work hands-on with several data sets and demonstrate best practices for exploring data and experimenting with model parameters. We will show the value of early stopping, and demonstrate how best to use it in a cross-validated setting. Along the way, we will write a grid search function from scratch that avoids many of the pitfalls in existing functions and demonstrate how to use it in a "targeted" fashion.
Lesson 4: Evaluating and Understanding the Model
After building a model, we will work through various ways to analyze and understand the model. We will review all the major metrics used, discuss when they are relevant or irrelevant, and learn how to put them in their proper context. This includes:
- Looking at ICE-plots to understand the practical impact of individual variables and if/how they interact with others
- Using the SHAP package and how to use those values to give meaningful "reasons" to a specific prediction.
- Examining the calibration of the model, understanding the consequences of poor calibration, and how to fix it.
By the end, you will be able to describe how the model works, which variables have the most impact, and in which scenarios one should be cautious applying the model.
This course is geared to data scientists of all levels who wish to gain a deep understanding of Gradient Boosting and how to apply it to real-world situations. The ideal participant will have some experience with building models.
You should know the Python data science toolkit (numpy, pandas, scikit-learn, matplotlib) and have experience fitting models on training sets, making predictions on test sets, and evaluating the quality of the model with metrics.
Access to the on-demand recording
Certificate of completion