Home / Python / 104.2.8 Joining and Merging datasets in Python

104.2.8 Joining and Merging datasets in Python

In this post we will learn how to merge two datasets in python using pandas library.

Data sets merging and Joining

  • Datasets:
    • TV Commercial Slots Analysis/orders.csv
    • TV Commercial Slots Analysis/slots.csv
In [103]:
orders=pd.read_csv("datasets\\TV Commercial Slots Analysis\\orders.csv")
orders.shape
Out[103]:
(1369, 9)
In [105]:
slots=pd.read_csv("datasets\\TV Commercial Slots Analysis\\slots.csv")
slots.shape
Out[105]:
(1764, 17)
In [108]:
# duplicates based on Unique_id
sum(orders.Unique_id.duplicated())
Out[108]:
3
In [109]:
# duplicates based on Unique_id
sum(slots.Unique_id.duplicated())
Out[109]:
13
In [110]:
#Removing Duplicates
orders1=orders.drop_duplicates(['Unique_id'])
slots1=slots.drop_duplicates(['Unique_id'])
In [111]:
sum(orders1.Unique_id.duplicated())
Out[111]:
0
In [112]:
sum(slots1.Unique_id.duplicated())
Out[112]:
0

Inner Join

In [113]:
inner_data=pd.merge(orders1, slots1, on='Unique_id', how='inner')
inner_data.shape
Out[113]:
(8, 25)

Outer Join

In [114]:
outer_data=pd.merge(orders1, slots1, on='Unique_id', how='outer')
outer_data.shape
Out[114]:
(3109, 25)

Left outer Join

In [115]:
L_outer_data=pd.merge(orders1, slots1, on='Unique_id', how='left')
L_outer_data.shape
Out[115]:
(1366, 25)

Righ outer Join

In [116]:
R_outer_data=pd.merge(orders1, slots1, on='Unique_id', how='right')
R_outer_data.shape
Out[116]:
(1751, 25)

LAB: Data Joins

  • Datasets
    • “./Telecom Data Analysis/Bill.csv”
    • “./Telecom Data Analysis/Complaints.csv”
  • Import the data and remove duplicates based on cust_id
  • Create a dataset for each of these requirements
  • All the customers who appear either in bill data or complaints data
  • All the customers who appear both in bill data and complaints data
  • All the customers from bill data: Customers who have bill data along with their complaints
  • All the customers from complaints data: Customers who have Complaints data along with their bill info
In [117]:
comp_data=pd.read_csv("datasets\\Telecom Data Analysis\\Complaints.csv")
comp_data.shape
Out[117]:
(6587, 8)
In [118]:
bill_data=pd.read_csv("datasets\\Telecom Data Analysis\\Bill.csv")
bill_data.shape
Out[118]:
(9462, 7)
In [119]:
#Import the data and remove duplicates based on cust_id
comp_data1=comp_data.drop_duplicates(['cust_id'])
comp_data1.shape
Out[119]:
(4856, 8)
In [120]:
bill_data1=bill_data.drop_duplicates(['cust_id'])
bill_data1.shape
Out[120]:
(9389, 7)
In [121]:
#All the customers who appear either in bill data or complaints data 
combined1=pd.merge(comp_data1, bill_data1, on='cust_id', how='outer')
combined1.shape
Out[121]:
(13952, 14)
In [122]:
#All the customers who appear both in bill data and complaints data
combined2=pd.merge(comp_data1, bill_data1, on='cust_id', how='inner')
combined2.shape
Out[122]:
(293, 14)
In [123]:
#All the customers from bill data: Customers who have bill data along with their complaints
combined3=pd.merge(comp_data1, bill_data1, on='cust_id', how='right')
combined3.shape
Out[123]:
(9389, 14)
In [124]:
#All the customers from complaints data: Customers who have Complaints data along with their bill info
combined4=pd.merge(comp_data1, bill_data1, on='cust_id', how='left')
combined4.shape
Out[124]:
(4856, 14)

About admin

Check Also

204.7.5 The Random Forest

Random Forest Like many trees form a forest, many decision tree model together form a …

Leave a Reply

Your email address will not be published. Required fields are marked *