Home / Python / Predictive Modeling & Machine Learning / 204.2.2 Logistic Function to Regression

204.2.2 Logistic Function to Regression

In last post we saw linear regression cannot be used if the final output is binary, yes or no. As it’s tough to fit a binary output on a linear function.

To solve this problem we can move toward some different kind of functions, a Logistic Function being the first choice.

A Logistic Function

This is how a Logistic Function look like:

The Logistic function

  • We want a model that predicts probabilities between 0 and 1, that is, S-shaped.
  • There are lots of s-shaped curves. We use the logistic model:
    Probability=e(β0+β1X)1+e($β0+β1X)

Logistic Regression Output

  • In logistic regression, we try to predict the probability instead of direct values
  • Y is binary, it takes only two values 1 and 0 instead of predicting 1 or 0 we predict the probability of 1 and probability of zero
  • This suits aptly for the binary categorical outputs like YES vs NO; WIN vs LOSS; Fraud vs Non Fraud

Practice : Logistic Regression

  • Dataset: Product Sales Data/Product_sales.csv
  • Build a logistic Regression line between Age and buying
  • A 4 years old customer, will he buy the product?
  • If Age is 105 then will that customer buy the product?
In [8]:
import pandas as pd 
sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv")

import statsmodels.formula.api as sm

# Build a logistic Regression line between Age and buying 
logit=sm.Logit(sales['Bought'],sales['Age'])
logit
Out[8]:
<statsmodels.discrete.discrete_model.Logit at 0x203ba4ac630>
In [9]:
result = logit.fit()
result
Optimization terminated successfully.
         Current function value: 0.584320
         Iterations 5
Out[9]:
<statsmodels.discrete.discrete_model.BinaryResultsWrapper at 0x203bbd90e48>
In [10]:
result.summary()
Out[10]:
Logit Regression Results
Dep. Variable: Bought No. Observations: 467
Model: Logit Df Residuals: 466
Method: MLE Df Model: 0
Date: Sun, 16 Oct 2016 Pseudo R-squ.: 0.1478
Time: 14:35:42 Log-Likelihood: -272.88
converged: True LL-Null: -320.21
LLR p-value: nan
coef std err z P>|z| [95.0% Conf. Int.]
Age 0.0294 0.003 8.813 0.000 0.023 0.036
In [11]:
###coefficients Interval of each coefficient

print (result.conf_int())
            0         1
Age  0.022851  0.035923
In [12]:
#One more way of fitting the model
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(sales[["Age"]],sales["Bought"])
Out[12]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
In [13]:
#A 4 years old customer, will he buy the product?
age1=4
predict_age1=logistic.predict(age1)
print(predict_age1)
[0]
In [14]:
#If Age is 105 then will that customer buy the product?
age2=105
predict_age2=logistic.predict(age2)
print(predict_age2)
[1]

About admin

Check Also

204.7.6 Practice : Random Forest

Let’s implement the concept of Random Forest into practice using Python. Practice : Random Forest …

Leave a Reply

Your email address will not be published. Required fields are marked *