Home / Predictive Modeling & Machine Learning / 203.1.5 Phi-Correlation Coefficient

# 203.1.5 Phi-Correlation Coefficient

The Biserial Correlation doesn’t work, if both the variables are binary or categorical. We need to use Phi-Coefficient. In fact there are many measures like chi-square, odds ratio, contingency index that will give us an idea on the association or the independence of two categorical variables. We need to prepare a contingency table before calculating these correlations

  | Y=1  | Y=0 |            |

———-|——|—–|————| X=1 ||\| ———-|———|——–|——-| X=0 ||| ———-|———|——–|——-| ||

LAB: Correlation between Categorical Variables

1. Is there any association between Bad_weatherand delayed_cancelled_flight?
2. Is there any association between technical issues and delayed_cancelled_flight. Are the flights getting delayed by technical issues?
3. Find correlation between holiday week and delayed flight indicator

Solution

1.  Is there any association between Bad_Weather_Ind and Delayed_Cancelled_flight_ind?

>library(vcd)
## Warning: package 'vcd' was built under R version 3.1.3
## Loading required package: grid
>contin_table<-table(air$Bad_weather, air$delayed_cancelled_flight)
>contin_table
##
##       NO YES
##   NO  37   3
##   YES  2  38
>assocstats(contin_table)
##                     X^2 df  P(> X^2)
## Likelihood Ratio 73.662  1 0.000e+00
## Pearson          61.288  1 4.885e-15
##
## Phi-Coefficient   : 0.875
## Contingency Coeff.: 0.659
## Cramer's V        : 0.875

2. Is there any association between technical issues and Delayed_Cancelled_flight_ind. Are the flights getting delayed by technical issues?

>contin_table<-table(air$Technical_issues_ind, air$delayed_cancelled_flight)
>contin_table
##
##       NO YES
##   NO  17  25
##   YES 22  16
>assocstats(contin_table)
##                     X^2 df P(> X^2)
## Likelihood Ratio 2.4345  1  0.11869
## Pearson          2.4227  1  0.11959
##
## Phi-Coefficient   : 0.174
## Contingency Coeff.: 0.171
## Cramer's V        : 0.174

No, the strength of association between these variables are very low with Phi-coffecient equal to 0.174

3. Find correlation between holiday week and delayed flight indicator

>contin_table<-table(air$Holiday_week, air$Delayed_cancelled_flight)
>contin_table
##
##       NO YES
##   NO  30  31
##   YES  9  10
>assocstats(contin_table)
##                       X^2 df P(> X^2)
## Likelihood Ratio 0.019045  1  0.89024
## Pearson          0.019037  1  0.89026
##
## Phi-Coefficient   : 0.015
## Contingency Coeff.: 0.015
## Cramer's V        : 0.015