Commit 5d87da76 authored by Rémy Huet's avatar Rémy Huet 💻
Browse files

Fin TP6

parent 21870265
{}
\ No newline at end of file
......@@ -337,8 +337,7 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
......
%% Cell type:markdown id: tags:
# AOS1
## TP3 - Kernel methods
%% Cell type:code id: tags:
```
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_lfw_people
```
%% Cell type:markdown id: tags:
### Qu1
Fetch the data
%% Cell type:code id: tags:
```
faces = fetch_lfw_people(min_faces_per_person=60)
```
%% Cell type:code id: tags:
```
print(faces.target)
print(faces.target_names)
print(np.unique(faces.target))
print(faces.images.shape)
print(faces.data.shape) # Images "flattened"
plt.imshow(faces.images[5])
```
%% Cell type:markdown id: tags:
Each sample has 2914 features (pixels).
We will first do a PCA to reduce the number of features before learning the SVM
%% Cell type:markdown id: tags:
### Question 2
We split the data in train and test sets.
%% Cell type:code id: tags:
```
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(faces.data, faces.target)
```
%% Cell type:markdown id: tags:
### Question 3
We want a PCA to reduce the number of features to 100
%% Cell type:code id: tags:
```
from sklearn.decomposition import PCA
pca = PCA(n_components=100, whiten=True)
X_train_pca = pca.fit_transform(X_train)
print(pca.n_components_)
print(pca.explained_variance_ratio_)
print(np.sum(pca.explained_variance_ratio_))
```
%% Cell type:markdown id: tags:
With 100 components, we keep more than 90 % of the explained value.
Now we want to train a avnilla svm on this data
%% Cell type:code id: tags:
```
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
svc = SVC()
svc.fit(X_train_pca, y_train)
```
%% Cell type:markdown id: tags:
With our trained data, we can predict the results for the test dataset and compare it to the test targets
%% Cell type:code id: tags:
```
X_test_pca = pca.transform(X_test)
y = svc.predict(X_test_pca)
print(confusion_matrix(y_test, y))
print(classification_report(y_test, y))
print(accuracy_score(y_test, y))
```
%% Cell type:markdown id: tags:
### Question 4
The SVM was trained with default hyperparameters.
Theses parameters are the following:
%% Cell type:code id: tags:
```
print(svc.C)
print(svc.gamma)
```
%% Cell type:markdown id: tags:
### Question 5
We use a gridsearchCV to perform a search on the hyperparameters
%% Cell type:code id: tags:
```
from sklearn.model_selection import GridSearchCV
parameters = {'C': np.logspace(-2, 3, 10), 'gamma': np.logspace(-4, 1, 10)}
clf = GridSearchCV(svc, parameters)
print(clf)
clf.fit(X_train_pca, y_train)
```
%% Cell type:code id: tags:
```
clf.best_params_
```
%% Cell type:markdown id: tags:
With these parameters :
%% Cell type:code id: tags:
```
svc = SVC(C=clf.best_params_['C'], gamma=clf.best_params_['gamma'])
svc.fit(X_train_pca, y_train)
y = svc.predict(X_test_pca)
print(classification_report(y_test, y))
```
%% Cell type:markdown id: tags:
#### Question 6
We want to add the number of principall components used to the CV
%% Cell type:code id: tags:
```
from sklearn.pipeline import make_pipeline
pca = PCA(whiten=True)
svc = SVC()
estimator = make_pipeline(pca, svc)
parameters = {
'pca__n_components': range(100),
'pca__n_components': [80, 90, 100, 110],
'svc__C' : np.logspace(-2, 3, 10),
'svc__gamma' : np.logspace(-4, 1, 10)
}
clf = GridSearchCV(estimator, parameters)
clf.fit(X_train_pca, y_train)
clf.fit(X_train, y_train)
clf.best_params_
```
%% Cell type:markdown id: tags:
Save the output of the previous cell (because it takes more than 45m to run) :
```
{'pca__n_components': 100,
'svc__C': 5.994842503189409,
'svc__gamma': 0.004641588833612782}
```
We retrieve the same parameters as preceding tests.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment