Boosting: Introduction

5 min readJul 27, 2024

Machine learning is rapidly evolving, and so is the available data. One of the main challenges of Machine Learning is creating a model with high prediction accuracy. Single-model approaches(using only one model for predictions), like Decision Trees or linear regression, often struggle with complex datasets.

An ensemble learning approach, Boosting, can help to create a more robust model for problems like that.

By the end of this article, you should be able to:

Know what Boosting is.
How Boosting works
How do you use Boosting and AdaBoost in Python?

Note: Don’t worry if some of these terms are new to you. You will understand it by the end of this article.

But before you start this article, you should know about

Concept of Ensemble Learning — combining multiple models to create a single model

Okay, let’s start.

What is Boosting?

When boosting was conceptualized in the late 1980s, a question emerged: “Can we group weak learners together to create stronger learners?” Zhou (2012) mentioned that this question is very important because it is easier to create a weak learner than a strong learner.

Weak learner, as in models with less than good predictive power. i.e., a model doesn’t do better than a random guess.
Strong learner, as in models with good predictive power. i.e., a model that does better than a random guess.

Since we group together (ensemble) weak models to create a single strong model, this is classified as one type of Ensemble Learning.

So boosting is not a single algorithm; instead, it’s a general approach that combines multiple models so that they can combine to create a strong model. In Boosting, the group of models are sequentially trained (one after another)

How Boosting Works?

The boosting algorithm works by creating a First Base Learner. This First Base Learner can use any algorithm; let’s say it uses a Decision Tree.

1. Now, this first learner will be trained on some of the observations from the dataset

2. This trained base learner is tested on all observations.

3. The incorrectly classified observations are passed to train the second base learner and

4. Base learner 2 is checked on the original dataset again, and the incorrectly classified observations are passed to the next base learner

5. This process is repeated until a specified number of base learners is trained.

As you may have noticed, the base learners manipulate the training data so that they can be selected for use by the next learner. This is basically the idea of Boosting — instead of manipulating the learner, the data is manipulated.

There are multiple algorithms that follow the approach of boosting — combining multiple models to create a strong model. Adaptive Boosting (AdaBoost), Gradient Boosting, Extreme Gradient Boosting(XGBoost), Light Gradient Boosting Machine (LightGBM) are some of the examples of the algorithms implemented with the boosting approach.

In this article, we will start with implementing AdaBoost in scikit-learn. AdaBoost is the first practical boosting algorithm implemented.

AdaBoost

In AdaBoost algorithm, the observations are selected based on the weight given to a particular observation. It was proposed and implemented in 1995 by Freund & Schapire. In AdaBoost, each observations are weighted and the one with the higher weight is selected for the next learner. Moreover, it Adaptively Boosts the next learner to learn better to have better accuracy.

This might be not very clear, but we will talk about AdaBoost in depth in the next article, in which you will understand everything.

Here, we will start by familiarizing ourselves with how easy it is to implement it in Python using scikit-learn.

This is it. This Python code is all you need to get started with AdaBoost for a classification problem.

Notice that we are using a Decision Tree inside AdaBoost while creating a model.

AdaBoost Implementation in Python

Here is the Kaggle Notebook, where I have implemented a complete ml workflow, from importing data to creating a model and testing it using AdaBoost. Please go through it once, and get ready for the next article.

Some of the disadvantages of Boosting Algorithms are:

It is vulnerable to outliers.
Computationally expensive to large datasets.
It is hard to implement for real-time applications.

Amazing articles and Videos on Boosting

find more on Reference and More Materials section below.

What’s next?

In the next article, we will dive into AdaBoost and how it actually works.

This is the 6th article on the series: Forming a strong foundation.

Here are the list of previous articles: