Home / Python / Predictive Modeling & Machine Learning / 204.2.1 Logistic Regression, why do we need it?

204.2.1 Logistic Regression, why do we need it?

In this series we will try to explore Logistic Regression Models. For the starters we will do a recap of Linear Regression and see if it works all the time.

Practice : What is the need of logistic regression?

  • Dataset: Product Sales Data/Product_sales.csv
  • What are the variables in the dataset?
  • Build a predictive model for Bought vs Age
  • What is R-Square?
  • If Age is 4 then will that customer buy the product?
  • If Age is 105 then will that customer buy the product?
In [2]:
import pandas as pd
sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv")
In [3]:
#What are the variables in the dataset? 
array(['Age', 'Bought'], dtype=object)
In [4]:
#Build a predictive model for Bought vs Age

### we need to use the statsmodels package, which enables many statistical methods to be used in Python
import statsmodels.formula.api as sm
from statsmodels.formula.api import ols
model = sm.ols(formula='Bought ~ Age', data=sales)
fitted = model.fit()
OLS Regression Results
Dep. Variable: Bought R-squared: 0.842
Model: OLS Adj. R-squared: 0.842
Method: Least Squares F-statistic: 2480.
Date: Sun, 16 Oct 2016 Prob (F-statistic): 1.63e-188
Time: 14:35:39 Log-Likelihood: 95.589
No. Observations: 467 AIC: -187.2
Df Residuals: 465 BIC: -178.9
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -0.1704 0.015 -11.156 0.000 -0.200 -0.140
Age 0.0209 0.000 49.803 0.000 0.020 0.022
Omnibus: 77.279 Durbin-Watson: 1.362
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1022.092
Skew: 0.056 Prob(JB): 1.14e-222
Kurtosis: 10.247 Cond. No. 60.7
In [5]:
#What is R-Square?
In [6]:
#If Age is 4 then will that customer buy the product?

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(sales[["Age"]], sales[["Bought"]])

In [7]:
array([[ 2.02851132]])

Something went wrong

  • The model that we built above is not right.
  • There is certain issues with the type of dependent variable.
  • The dependent variable is not continuous it is binary.
  • We can’t fit a linear regression line to this data.

Why not linear ?

  • Consider Product sales data. The dataset has two columns.
    • Age – continuous variable between 6-80
    • Buy(0- Yes ; 1-No)

Real-life examples

  • Gaming – Win vs. Loss
  • Sales – Buying vs. Not buying
  • Marketing – Response vs. No Response
  • Credit card & Loans – Default vs. Non Default
  • Operations – Attrition vs. Retention
  • Websites – Click vs. No click
  • Fraud identification – Fraud vs. Non Fraud
  • Healthcare – Cure vs. No Cure

The output of these non linear functions cannot be justifies with a linear model.

Some Nonlinear Functions

About admin

Check Also

204.7.6 Practice : Random Forest

Let’s implement the concept of Random Forest into practice using Python. Practice : Random Forest …

Leave a Reply

Your email address will not be published. Required fields are marked *