What is need of correlation?
- Is there any association between hours of study and grades?
- Is there any association between number of temples in a city & murder rate?
- What happens to sweater sales with increase in temperature? What is the strength of association between them?
- What happens to ice-cream sales v.s temperature? What is the strength of association between them?
- How to quantify the association?
- Which of the above examples has very strong association?
- It is a measure of linear association
- r is the ratio of variance together vs product of individual variances.
- Correlation 0 No linear association
- Correlation 0 to 0.25 Negligible positive association
- Correlation 0.25-0.5 Weak positive association
- Correlation 0.5-0.75 Moderate positive association
- Correlation >0.75 Very Strong positive association
Practice : Correlation Calculation
- Dataset: AirPassengers\AirPassengers.csv
- Find the correlation between number of passengers and promotional budget.
- Draw a scatter plot between number of passengers and promotional budget
- Find the correlation between number of passengers and Service_Quality_Score
import pandas as pd air = pd.read_csv("datasets\\AirPassengers\\AirPassengers.csv") air.shape
array(['Week_num', 'Passengers', 'Promotion_Budget', 'Service_Quality_Score', 'Holiday_week', 'Delayed_Cancelled_flight_ind', 'Inter_metro_flight_ratio', 'Bad_Weather_Ind', 'Technical_issues_ind'], dtype=object)
#Find the correlation between number of passengers and promotional budget. import numpy as np np.corrcoef(air.Passengers,air.Promotion_Budget)
array([[ 1. , 0.96585103], [ 0.96585103, 1. ]])
#Draw a scatter plot between number of passengers and promotional budget import matplotlib.pyplot as plt %matplotlib inline plt.scatter(air.Passengers, air.Promotion_Budget)
<matplotlib.collections.PathCollection at 0x8feb8d0>
#Find the correlation between number of passengers and Service_Quality_Score np.corrcoef(air.Passengers,air.Service_Quality_Score)
array([[ 1. , -0.88653002], [-0.88653002, 1. ]])
Beyond Pearson Correlation
- Correlation coefficient measures for different types of data
|Variable Y\X||Quantitative /Continuous X||Ordinal/Ranked/Discrete X||Nominal/Categorical X|
|Quantitative Y||Pearson r||Biserial rb||Point Biserial rpb|
|Ordinal/Ranked/Discrete Y||Biserial rb||Spearman rho/Kendall’s||Rank Biserial rrb|
|Nominal/Categorical Y||Point Biserial rpb||Rank Biserial rrb||Phi, Contingency Coeff, V|