Home / Predictive Modeling & Machine Learning / 203.1.5 Phi-Correlation Coefficient

203.1.5 Phi-Correlation Coefficient

The Biserial Correlation doesn’t work, if both the variables are binary or categorical. We need to use Phi-Coefficient. In fact there are many measures like chi-square, odds ratio, contingency index that will give us an idea on the association or the independence of two categorical variables. We need to prepare a contingency table before calculating these correlations

  | Y=1  | Y=0 |            |

———-|——|—–|————| X=1 |n_{11}|\n_{10}|n_{1} ———-|———|——–|——-| X=0 |n_{01}|n_{00}|n_{0} ———-|———|——–|——-| |n_{1}|n

\phi=\frac{{n_{11}n_{00}}-{n_{10}n_{01}}}{\sqrt{n_1n_0n_0n_1}}

LAB: Correlation between Categorical Variables

  1. Is there any association between Bad_weatherand delayed_cancelled_flight?
  2. Is there any association between technical issues and delayed_cancelled_flight. Are the flights getting delayed by technical issues?
  3. Find correlation between holiday week and delayed flight indicator

Solution

1.  Is there any association between Bad_Weather_Ind and Delayed_Cancelled_flight_ind?

>library(vcd)
## Warning: package 'vcd' was built under R version 3.1.3
## Loading required package: grid
>contin_table<-table(air$Bad_weather, air$delayed_cancelled_flight)
>contin_table
##      
##       NO YES
##   NO  37   3
##   YES  2  38
>assocstats(contin_table)
##                     X^2 df  P(> X^2)
## Likelihood Ratio 73.662  1 0.000e+00
## Pearson          61.288  1 4.885e-15
## 
## Phi-Coefficient   : 0.875 
## Contingency Coeff.: 0.659 
## Cramer's V        : 0.875

2. Is there any association between technical issues and Delayed_Cancelled_flight_ind. Are the flights getting delayed by technical issues?

>contin_table<-table(air$Technical_issues_ind, air$delayed_cancelled_flight)
>contin_table
##      
##       NO YES
##   NO  17  25
##   YES 22  16
>assocstats(contin_table)
##                     X^2 df P(> X^2)
## Likelihood Ratio 2.4345  1  0.11869
## Pearson          2.4227  1  0.11959
## 
## Phi-Coefficient   : 0.174 
## Contingency Coeff.: 0.171 
## Cramer's V        : 0.174

No, the strength of association between these variables are very low with Phi-coffecient equal to 0.174

3. Find correlation between holiday week and delayed flight indicator

>contin_table<-table(air$Holiday_week, air$Delayed_cancelled_flight)
>contin_table
##      
##       NO YES
##   NO  30  31
##   YES  9  10
>assocstats(contin_table)
##                       X^2 df P(> X^2)
## Likelihood Ratio 0.019045  1  0.89024
## Pearson          0.019037  1  0.89026
## 
## Phi-Coefficient   : 0.015 
## Contingency Coeff.: 0.015 
## Cramer's V        : 0.015

About admin

Leave a Reply

Your email address will not be published. Required fields are marked *