Home / KNIME / Sub Setting Data in KNIME

Sub Setting Data in KNIME

In this post, we shall see the multiple ways of sub setting data in KNIME.

Reading auto_mpg.csv file

Step-1: Add the CSV Reader node from Node Repository: IO > Read > CSV Reader

Step-2: Right click on the node and select ‘Configure’

Step-3: In the Settings tab browse and choose the data from where it is located. Select the appropriate reader options as applicable.

Step-4: Click Apply and OK and then Execute the node

Now our dataset is loaded and we can view the table by right clicking on the node and selecting ‘File Table’. We can see that our table has 398 rows and 9 columns.

Dropping/Excluding a Column

We will use Column Filter node to exclude the Origin column from our auto_mpg dataset

Step-1: Add the Column Filter node from Node Repository: Manipulation > Column > Filter > Column Filter

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Column Filter tab exclude Origin column and include all other columns

Step-5: Click Apply and OK and then Execute the node

Step-6: Upon execution right click on the node and select Filtered table to view the filtered dataset containing only 8 columns with the origin column now being dropped out

Including/Excluding row/s based on different criteria

We will use Row Filter node and utilize different criteria available for including or excluding rows from columns.

Criteria – 1:  Based on Row Number

Step-1: Add the Row Filter node from Node Repository: Manipulation > Row > Filter > Row Filter

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Filter Criteria tab, we can choose to either Include or Exclude rows by number. In this case, we will select Include rows by number

Step-5: In the Row number range section enter 10 as first row number and 50 as last row number for the row number range

Step-6: Click Apply and OK and then Execute the node

Step-7: Upon execution right click on the node and select Filtered to view the filtered dataset with rows starting from 10 to 50 only being included

Criteria – 2:  Based on Pattern Matching

Step-1: Add the Row Filter node from Node Repository: Manipulation > Row > Filter > Row Filter

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Filter Criteria tab, we can choose to either Include or Exclude rows by attribute value. In this case, we will select Include rows by attribute value

Step-5: In the Column to test dropdown select model_year column

Step-6: In the Matching criteria select use pattern matching and enter 70

Step-7: Click Apply and OK and then Execute the node

Step-8: Upon execution right click on the node and select Filtered to view the filtered dataset with rows containing model_year as 70 only being included

Criteria – 3:  Based on Range Check

Step-1: Add the Row Filter node from Node Repository: Manipulation > Row > Filter > Row Filter

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Filter Criteria tab, we can choose to either Include or Exclude rows by attribute value. In this case, we will select Include rows by attribute value

Step-5: In the Column to test dropdown select model_year column

Step-6: In the Matching criteria select use range checking and enter lower bound as 70 and upper bound as 75

Step-7: Click Apply and OK and then Execute the node

Step-8: Upon execution right click on the node and select Filtered to view the filtered dataset with rows containing model_year from 70 to 75 only being included

Criteria – 4:  Based on Wild Card Pattern Matching

Step-1: Add the Row Filter node from Node Repository: Manipulation > Row > Filter > Row Filter

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Filter Criteria tab, we can choose to either Include or Exclude rows by attribute value. In this case, we will select Exclude rows by attribute value

Step-5: In the Column to test dropdown select car_name column

Step-6: In the Matching criteria select use pattern matching and enter *ford* and select contains wild cards

Step-7: Click Apply and OK and then Execute the node

Step-8: Upon execution right click on the node and select Filtered to view the filtered dataset with all the rows containing “ford” in the car_name column being excluded

Criteria – 5:  Based on Row ID

Step-1: Add the Row Filter node from Node Repository: Manipulation > Row > Filter > Row Filter

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Filter Criteria tab, we can choose to either Include or Exclude rows by number. In this case, we will select Include rows by row ID

Step-5: In the Regular expression section enter Row22

Step-6: Click Apply and OK and then Execute the node

Step-7: Upon execution right click on the node and select Filtered to view the filtered dataset with only the Row22 being included

Including/Excluding row/s based on multiple rules

We will use Rule-based Row Filter node to include or exclude rows based on multiple rules being applied on the columns.

Step-1: Add the Rule-based Row Filter node from Node Repository: Manipulation > Row > Filter > Rule-based Row Filter

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Rule Editor tab, we shall provide the rules. In this case, the rule is to select rows where the value in the cylinders column is either 6 or 8

Step-5: We can choose to either include or exclude the true matches. In this case, we will select Exclude TRUE matches

Step-6: Click Apply and OK and then Execute the node

Step-7: Upon execution right click on the node and select Filtered to view the filtered dataset with all the rows containing the cylinders column value either 6 or 8 only being included

Subsetting rows based on nominal values

We will use nominal value row filter to include rows based on nominal values in the model year column. As the name suggested nominal value row filter can be applied only on string columns containing categorical values. Therefore, we will first convert the model_year column into string column using the number to string node. Then subsequently, we apply the filter.

Step-1: Add the Rule-based Row Filter node from Node Repository: Manipulation > Row > Filter > Nominal Value Row Filter

Step-2: Connect it to the CSV Reader node

Step-3: Right click on the node and select ‘Configure’

Step-4: In the Selection tab, select model_year column from the dropdown and include 70, 75 and 80 under nominal value selection section

Step-5: Click Apply and OK and then Execute the node

Step-7: Upon execution right click on the node and select Included to view the filtered dataset with rows containing 70, 75 and 80 as values in the model_year column only being included

About V2K

Leave a Reply

Your email address will not be published. Required fields are marked *