Home / Python / Predictive Modeling & Machine Learning / 204.6.2 Practice : Simple Classifier

# 204.6.2 Practice : Simple Classifier

In this practice session we will cover all the things we discussed about simple classifiers in last post.

## Practice : Simple Classifiers

• Dataset: Fraud Transaction/Transactions_sample.csv
• Draw a classification graph that shows all the classes
• Build a logistic regression classifier
• Draw the classifier on the data plot

## Solution

In [1]:
```#Importing the dataset:
import pandas as pd
```
Out[1]:
id Total_Amount Tr_Count_week Fraud_id
0 16078 7294.60 4.79 0
1 41365 7659.53 2.45 0
2 11666 8259.29 10.77 0
3 11824 11630.25 15.29 1
4 36414 12286.63 22.18 1
5 90 12783.34 16.34 1
In [2]:
```#Name of the columns
Transactions_sample.columns
```
Out[2]:
`Index(['id', 'Total_Amount', 'Tr_Count_week', 'Fraud_id'], dtype='object')`
In [3]:
```#The clasification graph distinguishing the two classes with colors or shapes.
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()

ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==0],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==0], s=10, c='b', marker="o", label='Fraud_id=0')
ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==1],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==1], s=10, c='r', marker="+", label='Fraud_id=1')
plt.legend(loc='upper left');
plt.show()
```
In [4]:
```#build a logistic regression model
import statsmodels.formula.api as sm
model1 = sm.logit(formula='Fraud_id ~ Total_Amount+Tr_Count_week', data=Transactions_sample)
fitted1 = model1.fit()
fitted1.summary()
```
```Optimization terminated successfully.
Current function value: 0.040114
Iterations 10
```
Out[4]:
Dep. Variable: No. Observations: Fraud_id 210 Logit 207 MLE 2 Mon, 14 Nov 2016 0.9421 19:18:37 -8.4239 True -145.55 2.795e-60
coef std err z P>|z| [95.0% Conf. Int.] -26.1481 7.817 -3.345 0.001 -41.469 -10.827 0.0025 0.001 2.386 0.017 0.000 0.005 0.1089 0.270 0.403 0.687 -0.421 0.638
In [5]:
```# Getting slope and intercept of the line
#coefficients
coef=fitted1.normalized_cov_params
print(coef)

slope1=coef.Intercept[1]/(-coef.Intercept[2])
intercept1=coef.Intercept[0]/(-coef.Intercept[2])
```
```               Intercept  Total_Amount  Tr_Count_week
Intercept      61.106470     -0.008058       1.428005
Total_Amount   -0.008058      0.000001      -0.000237
Tr_Count_week   1.428005     -0.000237       0.072958
```
In [8]:
```import matplotlib.pyplot as plt

fig = plt.figure()

ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==0],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==0], s=30, c='b', marker="o", label='Fraud_id 0')
ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==1],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==1], s=30, c='r', marker="+", label='Fraud_id 1')

plt.xlim(min(Transactions_sample.Total_Amount), max(Transactions_sample.Total_Amount))
plt.ylim(min(Transactions_sample.Tr_Count_week), max(Transactions_sample.Tr_Count_week))

plt.legend(loc='upper left');

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1])
plt.show()
```
In [9]:
```#Accuracy of the model
#Creating the confusion matrix
predicted_values=fitted1.predict(Transactions_sample[["Total_Amount"]+["Tr_Count_week"]])
print('Predicted Values: ', predicted_values[1:10])

threshold=0.5

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

print('Predicted Class: ', predicted_class)

from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Transactions_sample[['Fraud_id']],predicted_class)
print('Confusion Matrix: ', ConfusionMatrix)

accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy: ', accuracy)

error=1-accuracy
print('Error: ', error)
```
```Predicted Values:  [ 0.00154015  0.01714584  0.9932035   0.99938783  0.99967144  0.99846609
0.99793177  0.99981494  0.99991438]
Predicted Class:  [ 0.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  0.  0.  0.  0.
1.  0.  0.  0.  1.  0.  1.  1.  0.  1.  0.  1.  0.  0.  0.  0.  1.  0.
1.  1.  0.  0.  0.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  0.  0.  1.
1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.
0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.
0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.
0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.
1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.
1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.
0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.
0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.
0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.]
Confusion Matrix:  [[104   0]
[  1 105]]
Accuracy:  0.995238095238
Error:  0.0047619047619
```