Commit dd7785d7 authored by Rémy Huet's avatar Rémy Huet 💻
Browse files

Some introduction

parent 8e23baf1
%% Cell type:markdown id:5c8980bd tags:
# AOS1 Problem
# AOS1 - Assignment
## Improving the accuracy and speed of support vector machines
## Mathilde Rineau, Remy Huet
## 17/10/2021
Authors : Mathilde Rineau, Rémy Huet
### Abstract
The paper "Improving the Accuracy and Speed of Support Vector Machines" by Burges and Schölkopf is investigating a method to improve ht speed an accuracy of a support vector machine.
As the authors say, SVM are wildly used for several applications.
To improve this method, the authors make the difference between two types of improvements to achieve :
- improving the generalization performance;
- improving the speed in test phase.
The authors propose and combine two methods to improve SVM performances : the "virtual support vector" method and the "reduced set" method.
With those two improvements, they announce a machine much faster (22 times than the original one) and more precise (1.1% vs 1.4% error) than the original one.
In this work, we will describe and program the two techniques they are used to see if these method are working as they say.
%% Cell type:code id:9f152334 tags:
``` python
```
#We will work on the mnist data set
#We load it from fetch_openml
from sklearn.datasets import fetch_openml
import pandas as pd
import matplotlib.pyplot as plt
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
#We print the caracteristics of X and Y
print(X.shape)
print(y.shape)
```
%% Output
(70000, 784)
(70000,)
%% Cell type:code id:4d3fa1c7 tags:
``` python
```
#We divide the data set in two parts: train set and test set
#According to the recommended values the train set's size is 60000 and the test set's size is 10000
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, train_size=60000, test_size=10000)
```
%% Cell type:code id:d809fc87 tags:
``` python
```
#First, we perform a SVC without preprocessing or improving in terms of accuracy or speed
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
#we perform the default SVC, with the hyperparameter C=10 and a polynomial kernel of degree 5
#according to the recommandations
svc = SVC(C=10, kernel = 'poly', degree = 5)
svc.fit(X_train, y_train)
```
%% Output
SVC(C=10, degree=5, kernel='poly')
%% Cell type:code id:8cb28178 tags:
``` python
```
#We predict the values for our test set
y_pred = svc.predict(X_test)
```
%% Cell type:code id:c1248238 tags:
``` python
```
#We compute the confusion matrix
print(confusion_matrix(y_test, y_pred))
```
%% Output
[[ 923 1 2 0 0 2 3 1 3 0]
[ 0 1157 4 1 0 1 1 3 2 0]
[ 7 10 925 4 0 0 5 2 1 0]
[ 3 7 3 1000 0 10 0 0 7 5]
[ 1 11 5 1 952 0 1 0 3 8]
[ 6 9 1 8 0 875 3 1 3 1]
[ 7 8 0 0 2 7 952 0 1 0]
[ 1 7 5 1 1 1 0 1070 2 11]
[ 3 8 4 8 0 10 0 2 905 4]
[ 2 6 2 5 6 3 0 11 6 957]]
%% Cell type:code id:ba4e38ac tags:
``` python
```
#We print the classification report
print(classification_report(y_test, y_pred))
```
%% Output
precision recall f1-score support
0 0.97 0.99 0.98 935
1 0.95 0.99 0.97 1169
2 0.97 0.97 0.97 954
3 0.97 0.97 0.97 1035
4 0.99 0.97 0.98 982
5 0.96 0.96 0.96 907
6 0.99 0.97 0.98 977
7 0.98 0.97 0.98 1099
8 0.97 0.96 0.96 944
9 0.97 0.96 0.96 998
accuracy 0.97 10000
macro avg 0.97 0.97 0.97 10000
weighted avg 0.97 0.97 0.97 10000
%% Cell type:code id:947b0895 tags:
``` python
```
#We print the accuracy of the SVC and the error rate
print("Accuracy: ",accuracy_score(y_test, y_pred))
print("Error rate: ",(1-accuracy_score(y_test, y_pred))*100,"%")
```
%% Output
Accuracy: 0.9716
Error rate: 2.839999999999998 %
%% Cell type:code id:81b09df7 tags:
``` python
```
#We then generated new training data by translating the resulting support vectors
#by one pixel in each of four directions
import numpy as np
print(svc.support_vectors_)
print(svc.support_vectors_.shape)
print(np.mean(svc.support_vectors_[0]))
print(svc.support_vectors_[0][1])
```
%% Output
[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
(8164, 784)
23.246173469387756
0.0
%% Cell type:code id:0e648133 tags:
``` python
```
def right_side_rescaling(support_vectors):
n,m = support_vectors.shape
#print(n,m)
support_vector_lin =support_vectors.reshape((-1, n*m))
#print(support_vector_lin.shape)
temp = support_vector_lin[0][0]
for i in range (n*m-2):
#print(support_vector_lin[0][i])
support_vector_lin[0][i] = support_vector_lin[0][i+1]
support_vector_lin[0][n*m-1] = temp
support_vectors = support_vector_lin.reshape(n,m)
return support_vectors
```
%% Cell type:code id:aa5535c9 tags:
``` python
```
m = []
m.append([1,2,3,4,5])
m.append([1,2,3,4,5])
print(right_side_rescaling(np.array(m)))
```
%% Output
[[2 3 4 5 1]
[2 3 4 4 1]]
%% Cell type:code id:21db8ae3 tags:
``` python
```
```
%% Cell type:code id:9bb8ab5a tags:
``` python
```
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment