In last post we saw linear regression cannot be used if the final output is binary, yes or no. As it’s tough to fit a binary output on a linear function.
To solve this problem we can move toward some different kind of functions, a Logistic Function being the first choice.
A Logistic Function
This is how a Logistic Function look like:
The Logistic function
- We want a model that predicts probabilities between 0 and 1, that is, S-shaped.
- There are lots of s-shaped curves. We use the logistic model:
Logistic Regression Output
- In logistic regression, we try to predict the probability instead of direct values
- Y is binary, it takes only two values 1 and 0 instead of predicting 1 or 0 we predict the probability of 1 and probability of zero
- This suits aptly for the binary categorical outputs like YES vs NO; WIN vs LOSS; Fraud vs Non Fraud
Practice : Logistic Regression
- Dataset: Product Sales Data/Product_sales.csv
- Build a logistic Regression line between Age and buying
- A 4 years old customer, will he buy the product?
- If Age is 105 then will that customer buy the product?
import pandas as pd sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv") import statsmodels.formula.api as sm # Build a logistic Regression line between Age and buying logit=sm.Logit(sales['Bought'],sales['Age']) logit
<statsmodels.discrete.discrete_model.Logit at 0x203ba4ac630>
result = logit.fit() result
Optimization terminated successfully. Current function value: 0.584320 Iterations 5
<statsmodels.discrete.discrete_model.BinaryResultsWrapper at 0x203bbd90e48>
|Dep. Variable:||Bought||No. Observations:||467|
|Date:||Sun, 16 Oct 2016||Pseudo R-squ.:||0.1478|
|coef||std err||z||P>|z|||[95.0% Conf. Int.]|
###coefficients Interval of each coefficient print (result.conf_int())
0 1 Age 0.022851 0.035923
#One more way of fitting the model from sklearn.linear_model import LogisticRegression logistic = LogisticRegression() logistic.fit(sales[["Age"]],sales["Bought"])
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
#A 4 years old customer, will he buy the product? age1=4 predict_age1=logistic.predict(age1) print(predict_age1)
#If Age is 105 then will that customer buy the product? age2=105 predict_age2=logistic.predict(age2) print(predict_age2)