Home / KNIME / Normalizing Data with KNIME

Normalizing Data with KNIME

Many data mining techniques involve distance computations. Therefore, it is important that the variables are standardized or else variables with higher values will influence the model. In this post, we shall see how to normalize or standardize the variables in a dataset using KNIME.

Download the dataset from here

Reading auto_mpg.csv file

Step-1: Add the CSV Reader node from Node Repository: IO > Read > CSV Reader

Step-2: Right click on the node and select ‘Configure’

Step-3: In the Settings tab browse and choose the data from where it is located. Select the appropriate reader options as applicable.

Step-4: Click Apply and OK and then Execute the node

Now our dataset is loaded and we can view the table by right clicking on the node and selecting ‘File Table’. We can see that our table has 398 rows and 9 columns.

Viewing Summary Statistics

We will now look at the summary statistics of the numeric variables in the dataset.

Step-1: Add the Statistics node from Node Repository: Analytics > Statistics > Statistics

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Options tab include all the variables for which you want the summary statistics to be computed.

Step-5: Click Apply and OK and then Execute the node

The output of the Statistics node indicates three potential variables namely, displacement, horsepower and weight that need to be normalized.

Min-Max Normalization

We will now see how to normalize the values in a column within a given minimum and maximum scale.

Step-1: Add the Statistics node from Node Repository: Manipulation > Column > Transform > Normalizer

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Methods tab include horsepower and weight columns

Step-5: In the Settings section select Min-Max Normalization and enter Min as 0.0 and Max as 1.0

Step-5: Click Apply and OK and then Execute the node

Step-6: Upon execution right click on the node and select ‘Normalized table’ to view the updated dataset with the normalized columns

Z-Score Normalization

We will now see how to perform Z-Score normalization.

Step-1: Add the Statistics node from Node Repository: Manipulation > Column > Transform > Normalizer

Step-2: Connect it to the previous Normalizer node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Methods tab include displacement column

Step-5: In the Settings section select Z-Score Normalization (Gaussian)

Step-5: Click Apply and OK and then Execute the node

Step-6: Upon execution right click on the node and select ‘Normalized table’ to view the updated dataset with the normalized columns

We can observe that the values in the horsepower and weight columns are now rescaled with 0 as the lowest value and 1 as the highest value. And the displacement column has been standardized using Z-Score method.

About V2K

Check Also

Creating Dummy Variables with KNIME

Dummy variables are an effective way of utilizing categorical variables in data mining methods like …

Leave a Reply

Your email address will not be published. Required fields are marked *