Home / KNIME / Creating Dummy Variables with KNIME

Creating Dummy Variables with KNIME

Dummy variables are an effective way of utilizing categorical variables in data mining methods like K Nearest Neighbours (KNN) and in regression (like interaction effect). Therefore, there arises a need to convert the categorical variables into dummy variables. In this post, we shall see how to create dummy variables for each of the unique values in a categorical variable using KNIME.

Reading auto_mpg.csv file

Step-1: Add the CSV Reader node from Node Repository: IO > Read > CSV Reader

Step-2: Right click on the node and select ‘Configure’

Step-3: In the Settings tab browse and choose the data from where it is located. Select the appropriate reader options as applicable.

Step-4: Click Apply and OK and then Execute the node

Now our dataset is loaded and we can view the table by right clicking on the node and selecting ‘File Table’. We can see that our table has 398 rows and 9 columns.

Number to String

We will first use the Number to String node in KNIME to convert the numeric column type into string column type.

Step-1: Add the Number To String node from Node Repository: Manipulation > Column > Convert & Replace > Number To String

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Options tab include cylinders column

Step-5: Click Apply and OK and then Execute the node

Step-6: Upon execution right click on the node and select Transformed input to view the updated dataset with the cylinders column now being converted into string column type

Creating Dummy Variables

We will now use the One to Many node to create dummy variables for each of the unique values in the categorical string column

Step-1: Add the One to Many node from Node Repository: Manipulation > Column > Transform > One to Many

Step-2: Connect it to the Number to String node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Columns to transform tab include cylinders column

Step-5: Click Apply and OK and then Execute the node

Step-6: Upon execution right click on the node and select Processed data to view the updated dataset with the five dummy variables newly created for each of the unique values (3, 4, 5, 6, 8) in the cylinders column

About V2K

Check Also

Binning Numeric Data with KNIME

In many situations, we find it convenient if the variables are categorical in nature while …

Leave a Reply

Your email address will not be published. Required fields are marked *