Commit dd7785d7 by Rémy Huet 💻

### Some introduction

parent 8e23baf1
 %% Cell type:markdown id:5c8980bd tags: # AOS1 Problem # AOS1 - Assignment ## Improving the accuracy and speed of support vector machines ## Mathilde Rineau, Remy Huet ## 17/10/2021 Authors : Mathilde Rineau, Rémy Huet ### Abstract The paper "Improving the Accuracy and Speed of Support Vector Machines" by Burges and Schölkopf is investigating a method to improve ht speed an accuracy of a support vector machine. As the authors say, SVM are wildly used for several applications. To improve this method, the authors make the difference between two types of improvements to achieve : - improving the generalization performance; - improving the speed in test phase. The authors propose and combine two methods to improve SVM performances : the "virtual support vector" method and the "reduced set" method. With those two improvements, they announce a machine much faster (22 times than the original one) and more precise (1.1% vs 1.4% error) than the original one. In this work, we will describe and program the two techniques they are used to see if these method are working as they say. %% Cell type:code id:9f152334 tags: ``` python ``` #We will work on the mnist data set #We load it from fetch_openml from sklearn.datasets import fetch_openml import pandas as pd import matplotlib.pyplot as plt X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) #We print the caracteristics of X and Y print(X.shape) print(y.shape) ``` %% Output (70000, 784) (70000,) %% Cell type:code id:4d3fa1c7 tags: ``` python ``` #We divide the data set in two parts: train set and test set #According to the recommended values the train set's size is 60000 and the test set's size is 10000 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, train_size=60000, test_size=10000) ``` %% Cell type:code id:d809fc87 tags: ``` python ``` #First, we perform a SVC without preprocessing or improving in terms of accuracy or speed from sklearn.svm import SVC from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report from sklearn.metrics import accuracy_score #we perform the default SVC, with the hyperparameter C=10 and a polynomial kernel of degree 5 #according to the recommandations svc = SVC(C=10, kernel = 'poly', degree = 5) svc.fit(X_train, y_train) ``` %% Output SVC(C=10, degree=5, kernel='poly') %% Cell type:code id:8cb28178 tags: ``` python ``` #We predict the values for our test set y_pred = svc.predict(X_test) ``` %% Cell type:code id:c1248238 tags: ``` python ``` #We compute the confusion matrix print(confusion_matrix(y_test, y_pred)) ``` %% Output [[ 923 1 2 0 0 2 3 1 3 0] [ 0 1157 4 1 0 1 1 3 2 0] [ 7 10 925 4 0 0 5 2 1 0] [ 3 7 3 1000 0 10 0 0 7 5] [ 1 11 5 1 952 0 1 0 3 8] [ 6 9 1 8 0 875 3 1 3 1] [ 7 8 0 0 2 7 952 0 1 0] [ 1 7 5 1 1 1 0 1070 2 11] [ 3 8 4 8 0 10 0 2 905 4] [ 2 6 2 5 6 3 0 11 6 957]] %% Cell type:code id:ba4e38ac tags: ``` python ``` #We print the classification report print(classification_report(y_test, y_pred)) ``` %% Output precision recall f1-score support 0 0.97 0.99 0.98 935 1 0.95 0.99 0.97 1169 2 0.97 0.97 0.97 954 3 0.97 0.97 0.97 1035 4 0.99 0.97 0.98 982 5 0.96 0.96 0.96 907 6 0.99 0.97 0.98 977 7 0.98 0.97 0.98 1099 8 0.97 0.96 0.96 944 9 0.97 0.96 0.96 998 accuracy 0.97 10000 macro avg 0.97 0.97 0.97 10000 weighted avg 0.97 0.97 0.97 10000 %% Cell type:code id:947b0895 tags: ``` python ``` #We print the accuracy of the SVC and the error rate print("Accuracy: ",accuracy_score(y_test, y_pred)) print("Error rate: ",(1-accuracy_score(y_test, y_pred))*100,"%") ``` %% Output Accuracy: 0.9716 Error rate: 2.839999999999998 % %% Cell type:code id:81b09df7 tags: ``` python ``` #We then generated new training data by translating the resulting support vectors #by one pixel in each of four directions import numpy as np print(svc.support_vectors_) print(svc.support_vectors_.shape) print(np.mean(svc.support_vectors_[0])) print(svc.support_vectors_[0][1]) ``` %% Output [[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]] (8164, 784) 23.246173469387756 0.0 %% Cell type:code id:0e648133 tags: ``` python ``` def right_side_rescaling(support_vectors): n,m = support_vectors.shape #print(n,m) support_vector_lin =support_vectors.reshape((-1, n*m)) #print(support_vector_lin.shape) temp = support_vector_lin[0][0] for i in range (n*m-2): #print(support_vector_lin[0][i]) support_vector_lin[0][i] = support_vector_lin[0][i+1] support_vector_lin[0][n*m-1] = temp support_vectors = support_vector_lin.reshape(n,m) return support_vectors ``` %% Cell type:code id:aa5535c9 tags: ``` python ``` m = [] m.append([1,2,3,4,5]) m.append([1,2,3,4,5]) print(right_side_rescaling(np.array(m))) ``` %% Output [[2 3 4 5 1] [2 3 4 4 1]] %% Cell type:code id:21db8ae3 tags: ``` python ``` ``` %% Cell type:code id:9bb8ab5a tags: ``` python ``` ``` ... ...
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!