Commit dd7785d7 authored by Rémy Huet's avatar Rémy Huet 💻
Browse files

Some introduction

parent 8e23baf1
%% Cell type:markdown id:5c8980bd tags: %% Cell type:markdown id:5c8980bd tags:
# AOS1 Problem # AOS1 - Assignment
## Improving the accuracy and speed of support vector machines
## Mathilde Rineau, Remy Huet Authors : Mathilde Rineau, Rémy Huet
## 17/10/2021
### Abstract
The paper "Improving the Accuracy and Speed of Support Vector Machines" by Burges and Schölkopf is investigating a method to improve ht speed an accuracy of a support vector machine.
As the authors say, SVM are wildly used for several applications.
To improve this method, the authors make the difference between two types of improvements to achieve :
- improving the generalization performance;
- improving the speed in test phase.
The authors propose and combine two methods to improve SVM performances : the "virtual support vector" method and the "reduced set" method.
With those two improvements, they announce a machine much faster (22 times than the original one) and more precise (1.1% vs 1.4% error) than the original one.
In this work, we will describe and program the two techniques they are used to see if these method are working as they say.
%% Cell type:code id:9f152334 tags: %% Cell type:code id:9f152334 tags:
``` python ```
#We will work on the mnist data set #We will work on the mnist data set
#We load it from fetch_openml #We load it from fetch_openml
from sklearn.datasets import fetch_openml from sklearn.datasets import fetch_openml
import pandas as pd import pandas as pd
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
#We print the caracteristics of X and Y #We print the caracteristics of X and Y
print(X.shape) print(X.shape)
print(y.shape) print(y.shape)
``` ```
%% Output
(70000, 784)
(70000,)
%% Cell type:code id:4d3fa1c7 tags: %% Cell type:code id:4d3fa1c7 tags:
``` python ```
#We divide the data set in two parts: train set and test set #We divide the data set in two parts: train set and test set
#According to the recommended values the train set's size is 60000 and the test set's size is 10000 #According to the recommended values the train set's size is 60000 and the test set's size is 10000
from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X_train, X_test, y_train, y_test = train_test_split(
X, y, train_size=60000, test_size=10000) X, y, train_size=60000, test_size=10000)
``` ```
%% Cell type:code id:d809fc87 tags: %% Cell type:code id:d809fc87 tags:
``` python ```
#First, we perform a SVC without preprocessing or improving in terms of accuracy or speed #First, we perform a SVC without preprocessing or improving in terms of accuracy or speed
from sklearn.svm import SVC from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score from sklearn.metrics import accuracy_score
#we perform the default SVC, with the hyperparameter C=10 and a polynomial kernel of degree 5 #we perform the default SVC, with the hyperparameter C=10 and a polynomial kernel of degree 5
#according to the recommandations #according to the recommandations
svc = SVC(C=10, kernel = 'poly', degree = 5) svc = SVC(C=10, kernel = 'poly', degree = 5)
svc.fit(X_train, y_train) svc.fit(X_train, y_train)
``` ```
%% Output
SVC(C=10, degree=5, kernel='poly')
%% Cell type:code id:8cb28178 tags: %% Cell type:code id:8cb28178 tags:
``` python ```
#We predict the values for our test set #We predict the values for our test set
y_pred = svc.predict(X_test) y_pred = svc.predict(X_test)
``` ```
%% Cell type:code id:c1248238 tags: %% Cell type:code id:c1248238 tags:
``` python ```
#We compute the confusion matrix #We compute the confusion matrix
print(confusion_matrix(y_test, y_pred)) print(confusion_matrix(y_test, y_pred))
``` ```
%% Output
[[ 923 1 2 0 0 2 3 1 3 0]
[ 0 1157 4 1 0 1 1 3 2 0]
[ 7 10 925 4 0 0 5 2 1 0]
[ 3 7 3 1000 0 10 0 0 7 5]
[ 1 11 5 1 952 0 1 0 3 8]
[ 6 9 1 8 0 875 3 1 3 1]
[ 7 8 0 0 2 7 952 0 1 0]
[ 1 7 5 1 1 1 0 1070 2 11]
[ 3 8 4 8 0 10 0 2 905 4]
[ 2 6 2 5 6 3 0 11 6 957]]
%% Cell type:code id:ba4e38ac tags: %% Cell type:code id:ba4e38ac tags:
``` python ```
#We print the classification report #We print the classification report
print(classification_report(y_test, y_pred)) print(classification_report(y_test, y_pred))
``` ```
%% Output
precision recall f1-score support
0 0.97 0.99 0.98 935
1 0.95 0.99 0.97 1169
2 0.97 0.97 0.97 954
3 0.97 0.97 0.97 1035
4 0.99 0.97 0.98 982
5 0.96 0.96 0.96 907
6 0.99 0.97 0.98 977
7 0.98 0.97 0.98 1099
8 0.97 0.96 0.96 944
9 0.97 0.96 0.96 998
accuracy 0.97 10000
macro avg 0.97 0.97 0.97 10000
weighted avg 0.97 0.97 0.97 10000
%% Cell type:code id:947b0895 tags: %% Cell type:code id:947b0895 tags:
``` python ```
#We print the accuracy of the SVC and the error rate #We print the accuracy of the SVC and the error rate
print("Accuracy: ",accuracy_score(y_test, y_pred)) print("Accuracy: ",accuracy_score(y_test, y_pred))
print("Error rate: ",(1-accuracy_score(y_test, y_pred))*100,"%") print("Error rate: ",(1-accuracy_score(y_test, y_pred))*100,"%")
``` ```
%% Output
Accuracy: 0.9716
Error rate: 2.839999999999998 %
%% Cell type:code id:81b09df7 tags: %% Cell type:code id:81b09df7 tags:
``` python ```
#We then generated new training data by translating the resulting support vectors #We then generated new training data by translating the resulting support vectors
#by one pixel in each of four directions #by one pixel in each of four directions
import numpy as np import numpy as np
print(svc.support_vectors_) print(svc.support_vectors_)
print(svc.support_vectors_.shape) print(svc.support_vectors_.shape)
print(np.mean(svc.support_vectors_[0])) print(np.mean(svc.support_vectors_[0]))
print(svc.support_vectors_[0][1]) print(svc.support_vectors_[0][1])
``` ```
%% Output
[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
(8164, 784)
23.246173469387756
0.0
%% Cell type:code id:0e648133 tags: %% Cell type:code id:0e648133 tags:
``` python ```
def right_side_rescaling(support_vectors): def right_side_rescaling(support_vectors):
n,m = support_vectors.shape n,m = support_vectors.shape
#print(n,m) #print(n,m)
support_vector_lin =support_vectors.reshape((-1, n*m)) support_vector_lin =support_vectors.reshape((-1, n*m))
#print(support_vector_lin.shape) #print(support_vector_lin.shape)
temp = support_vector_lin[0][0] temp = support_vector_lin[0][0]
for i in range (n*m-2): for i in range (n*m-2):
#print(support_vector_lin[0][i]) #print(support_vector_lin[0][i])
support_vector_lin[0][i] = support_vector_lin[0][i+1] support_vector_lin[0][i] = support_vector_lin[0][i+1]
support_vector_lin[0][n*m-1] = temp support_vector_lin[0][n*m-1] = temp
support_vectors = support_vector_lin.reshape(n,m) support_vectors = support_vector_lin.reshape(n,m)
return support_vectors return support_vectors
``` ```
%% Cell type:code id:aa5535c9 tags: %% Cell type:code id:aa5535c9 tags:
``` python ```
m = [] m = []
m.append([1,2,3,4,5]) m.append([1,2,3,4,5])
m.append([1,2,3,4,5]) m.append([1,2,3,4,5])
print(right_side_rescaling(np.array(m))) print(right_side_rescaling(np.array(m)))
``` ```
%% Output
[[2 3 4 5 1]
[2 3 4 4 1]]
%% Cell type:code id:21db8ae3 tags: %% Cell type:code id:21db8ae3 tags:
``` python ```
``` ```
%% Cell type:code id:9bb8ab5a tags: %% Cell type:code id:9bb8ab5a tags:
``` python ```
``` ```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment