Home / Tag Archives: dataset manipulation

Tag Archives: dataset manipulation

104.2.7 Identifying and Removing Duplicate values from dataset in Python

In this post we will understand how to identify and remove the duplicate values form dataset. We will use bill dataset from Telecom Data Analysis folder. Identifying & Removing Duplicates In [90]: bill_data=pd.read_csv("datasets\\Telecom Data Analysis\\Bill.csv") bill_data.shape Out[90]: (9462, 7) In [87]: #Identify duplicates records in the data dupes=bill_data.duplicated() sum(dupes) Out[87]: 10 In [88]: …

Read More »

104.2.6 Sorting the data in python

In previous post we created subsets of data by condition filtering, in this post we will create the new subsets by sorting one or more column values. Sorting the data We will use Online retail dataset. In [10]: Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1") Online_Retail.head(5) Out[10]: InvoiceNo StockCode Description …

Read More »

104.2.3 Manipulting datasets in python

In this blog we will see how we can manipulate imported dataset into subsets. Sub-setting the data Dataset: “./World Bank Data/GDP.csv“ In [30]: import pandas as pd #The below line may throw some error gdp=pd.read_csv("datasets\\World Bank Data\\GDP.csv",encoding = "ISO-8859-1") gdp.shape Out[30]: (194, 4) In [29]: gdp.columns.values Out[29]: array(['Country_code', 'Rank', 'Country', 'GDP'], dtype=object) …

Read More »