Home / Python / Predictive Modeling & Machine Learning / 204.6.9 Digit Recognition using SVM

204.6.9 Digit Recognition using SVM

In this final post of this series we will put SVM into practice by solving an image classification problem.

Practice : Digit Recognition using SVM

  • Take an image of a handwritten single digit, and determine what that digit is.
  • Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990).
  • The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values.
  • Build an SVM model that can be used as the digit recognizer
  • Use the test dataset to validate the true classification power of the model
  • What is the final accuracy of the model?
  • What is the final accuracy of the model?
In [36]:
#Importing test and training data

train_data = numpy.loadtxt('datasets/Digit Recognizer/USPS/zip.train.txt')
test_data  = numpy.loadtxt('datasets/Digit Recognizer/USPS/zip.test.txt')

train_data.shape
test_data.shape
Out[36]:
(2007, 257)
In [37]:
for i in range(0,9):
    data_row=train_data[i][1:]
    #pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
    pixels = numpy.matrix(data_row)
    pixels=pixels.reshape(16,16)
    plt.figure(figsize=(10,10))
    plt.subplot(3,3,i+1)
    plt.imshow(pixels)
In [38]:
  #Are there any missing values?
sum(sum(pd.isnull(train_data))) 
sum(sum(pd.isnull(test_data))) 
Out[38]:
0
In [39]:
#The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values. 	
#The first variable is label
train_data1= pd.DataFrame(train_data)
train_data1[0].value_counts()
Out[39]:
0.0    1194
1.0    1005
2.0     731
6.0     664
3.0     658
4.0     652
7.0     645
9.0     644
5.0     556
8.0     542
Name: 0, dtype: int64
In [40]:
#Build an SVM model that can be used as the digit recognizer 
########SVM Model Building 
#Verify the code with small data
X1=train_data[:5000,range(1,257)]
Y1 =train_data[0:5000,0]
import time
start_time = time.time()
numbersvm = svm.SVC(kernel='rbf', C=1).fit(X1,Y1)
print("---Time taken is %s seconds ---" % (time.time() - start_time))
---Time taken is 2.6671526432037354 seconds ---
In [41]:
predict6 = numbersvm.predict(X1)
Y1 = pd.DataFrame(Y1)
Y1[0].value_counts()

predict6=pd.DataFrame(predict6)
predict6[0].value_counts()
Out[41]:
0.0    847
1.0    678
9.0    484
2.0    484
6.0    474
7.0    462
4.0    441
3.0    391
8.0    387
5.0    352
Name: 0, dtype: int64
In [42]:
#Confusion Matrix
conf_mat = confusion_matrix(Y1,predict6)
conf_mat
Out[42]:
array([[845,   0,   0,   0,   1,   0,   1,   0,   0,   0],
       [  0, 674,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   1, 478,   2,   3,   0,   0,   1,   3,   0],
       [  0,   0,   3, 385,   1,   2,   0,   0,   2,   2],
       [  0,   1,   0,   0, 428,   1,   1,   0,   0,   3],
       [  1,   0,   1,   2,   1, 346,   1,   0,   0,   0],
       [  1,   1,   1,   0,   3,   1, 471,   0,   0,   0],
       [  0,   0,   0,   0,   2,   0,   0, 455,   3,   1],
       [  0,   1,   1,   1,   1,   2,   0,   2, 379,   0],
       [  0,   0,   0,   1,   1,   0,   0,   4,   0, 478]])
In [43]:
Accuracy = numbersvm.score(X1,Y1)
Accuracy
Out[43]:
0.98780000000000001
In [44]:
#####Model on Full Data 
X2=train_data[:,range(1,257)]
Y2 =train_data[:,0]
import time
start_time = time.time()
numbersvm = svm.SVC(kernel='rbf', C=1).fit(X2,Y2)
print("---Time taken is %s seconds ---" % (time.time() - start_time)) 
---Time taken is 4.5982630252838135 seconds ---
In [45]:
#Confusion Matrix
predict7 = numbersvm.predict(X2)
conf_mat = confusion_matrix(Y2,predict7)
conf_mat
Out[45]:
array([[1191,    0,    0,    1,    1,    0,    1,    0,    0,    0],
       [   0, 1005,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    1,  717,    3,    7,    0,    0,    1,    2,    0],
       [   0,    0,    2,  647,    0,    3,    0,    1,    4,    1],
       [   0,    2,    0,    0,  645,    0,    2,    0,    0,    3],
       [   2,    0,    3,    3,    2,  544,    2,    0,    0,    0],
       [   2,    1,    1,    0,    4,    1,  655,    0,    0,    0],
       [   0,    0,    2,    0,    2,    0,    0,  635,    3,    3],
       [   0,    1,    1,    0,    3,    3,    0,    2,  532,    0],
       [   0,    0,    0,    2,    2,    0,    0,    6,    0,  634]])
In [46]:
print('Accuracy is : ',numbersvm.score(X1,Y1))
Accuracy is :  0.9884
In [47]:
###Out of time validation with test data
Ex1 = test_data[:,range(1,257)]
Ey1 = test_data[:,0]
test_predict = numbersvm.predict(Ex1)
conf_mat = confusion_matrix(Ey1,test_predict)
conf_mat
Out[47]:
array([[355,   0,   2,   0,   1,   0,   0,   0,   1,   0],
       [  0, 255,   0,   0,   5,   0,   3,   0,   0,   1],
       [  3,   0, 181,   2,   5,   2,   0,   1,   4,   0],
       [  1,   0,   3, 146,   0,  10,   0,   1,   5,   0],
       [  0,   1,   3,   0, 188,   1,   1,   1,   1,   4],
       [  4,   0,   0,   4,   1, 147,   0,   0,   1,   3],
       [  3,   0,   3,   0,   2,   2, 159,   0,   1,   0],
       [  0,   0,   1,   0,   5,   1,   0, 137,   1,   2],
       [  3,   0,   2,   1,   0,   4,   1,   1, 153,   1],
       [  0,   0,   0,   1,   4,   0,   0,   0,   2, 170]])
In [48]:
for i in range(0,9):
    data_row=train_data[i][1:]
    #pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
    pixels = numpy.matrix(data_row)
    pixels=pixels.reshape(16,16)
    plt.figure(figsize=(10,10))
    plt.subplot(3,3,i+1)
    plt.imshow(pixels)
In [49]:
#Lets see some errors in predictions images. 
# Wrong predictions
wrong_pred = numpy.zeros(2007)
cnt=0  
for i in range(0,2007):
    if test_predict[i]!=Ey1[i]:
       wrong_pred[cnt]=Ey1[i]
       cnt= cnt+1
cnt
Out[49]:

116

We can see out of 2007 images only 116 were wrongly identified by our SVM model.

With this post we will be ending the series here. In next series we will cover Random Forest and Boosting.

About admin

Check Also

204.7.6 Practice : Random Forest

Let’s implement the concept of Random Forest into practice using Python. Practice : Random Forest …

Leave a Reply

Your email address will not be published. Required fields are marked *