Home / KNIME / Binning Numeric Data with KNIME

Binning Numeric Data with KNIME

In many situations, we find it convenient if the variables are categorical in nature while doing data mining. Especially, some of the classification methods in data mining, like Naïve Bayes classification, requires that the variables be categorical in nature. In such situations, we need to convert the continuous numeric variables into categorical variables. In this post, we shall see how to bin numeric data into categorical data using KNIME.

Reading auto_mpg.csv file

Step-1: Add the CSV Reader node from Node Repository: IO > Read > CSV Reader

Step-2: Right click on the node and select ‘Configure’

Step-3: In the Settings tab browse and choose the data from where it is located. Select the appropriate reader options as applicable.

Step-4: Click Apply and OK and then Execute the node

Now our dataset is loaded and we can view the table by right clicking on the node and selecting ‘File Table’. We can see that our table has 398 rows and 9 columns.

Fixed Number of Bins

We will use the Auto-Binner node in KNIME to automatically split a continuous column into fixed number of bins.

Step-1: Add the Auto-Binner node from Node Repository: Manipulation > Column > Binning > Auto-Binner

Step-2: Right click on the node and select ‘Configure’

Step-3: In the Auto Binner Settings tab include displacement column

Step-4: In the Binning Method section select Fixed number of bins, enter 3 for number of bins and select frequency for Equal

Step-5: In the Bin Naming section select Numbered

Step-6: Click Apply and OK and then Execute the node

Step-7: Upon execution right click on the node and select Binned Data to view the updated dataset with the binned column newly added

We observe that the displacement column which was continuous, has now been binned into three categories (Bin 1, Bin 2, Bin 3) based on equal frequency which is represented by the displacement [Binned] column.

Quantile Based Binning

We will use the Auto-Binner node in KNIME to automatically split a continuous column into categorical column that is binned based on the quantiles.

Step-1: Add the Auto-Binner node from Node Repository: Manipulation > Column > Binning > Auto-Binner

Step-2: Right click on the node and select ‘Configure’

Step-3: In the Auto Binner Settings tab include displacement column

Step-4: In the Binning Method section select Sample quantiles, enter 0.0, 0.25, 0.5, 0.75, 1.0 in the Quantiles text box

Step-5: In the Bin Naming section select Numbered

Step-6: Click Apply and OK and then Execute the node

Step-7: Upon execution right click on the node and select Binned Data to view the updated dataset with the binned column newly added

We observe that the displacement column which was continuous, has now been binned into four categories (Bin 1, Bin 2, Bin 3, Bin 4) based on the specified quantiles which is represented by the displacement [Binned] column.

Interval Based Binning

Let us suppose we want to create a categorical variable out of a continuous numeric variable based on a user specified interval. For such cases, we will use Numeric Binner node in KNIME.

Step-1: Add the Auto-Binner node from Node Repository: Manipulation > Column > Binning > Numeric Binner

Step-2: Right click on the node and select ‘Configure’

Step-3: In the Intervals tab select cylinders column on the left

Step-4: In the right section add Bin 1, Bin 2 and Bin 3 and specify the lower bound and upper bound interval for each of these bins at the bottom of the right section

Step-5: Select the tick box Append new column and enter cylinders_binned as the name of the new column

Step-6: Click Apply and OK and then Execute the node

Step-7: Upon execution right click on the node and select Binned Data to view the updated dataset with the binned column newly added

We observe that the displacement column which was continuous, has now been binned into three categories (Bin 1, Bin 2, Bin 3) based on the specified intervals which is represented by the cylinders_binned column.

About V2K

Check Also

Creating Dummy Variables with KNIME

Dummy variables are an effective way of utilizing categorical variables in data mining methods like …

Leave a Reply

Your email address will not be published. Required fields are marked *