Commit 21870265 authored by Rémy Huet's avatar Rémy Huet 💻
Browse files

TP6 -> looooooong clf

parent ec4ef403
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AOS1\n",
"## TP3 - Kernel methods"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from sklearn.datasets import fetch_lfw_people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Qu1\n",
"\n",
"Fetch the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"faces = fetch_lfw_people(min_faces_per_person=60)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(faces.target)\n",
"print(faces.target_names)\n",
"print(np.unique(faces.target))\n",
"\n",
"print(faces.images.shape)\n",
"print(faces.data.shape) # Images \"flattened\"\n",
"\n",
"plt.imshow(faces.images[5])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each sample has 2914 features (pixels).\n",
"We will first do a PCA to reduce the number of features before learning the SVM"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question 2\n",
"\n",
"We split the data in train and test sets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(faces.data, faces.target)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question 3\n",
"\n",
"We want a PCA to reduce the number of features to 100"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"pca = PCA(n_components=100, whiten=True)\n",
"\n",
"X_train_pca = pca.fit_transform(X_train)\n",
"\n",
"print(pca.n_components_)\n",
"print(pca.explained_variance_ratio_)\n",
"print(np.sum(pca.explained_variance_ratio_))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With 100 components, we keep more than 90 % of the explained value.\n",
"\n",
"Now we want to train a avnilla svm on this data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.svm import SVC\n",
"from sklearn.metrics import confusion_matrix\n",
"from sklearn.metrics import classification_report\n",
"from sklearn.metrics import accuracy_score\n",
"\n",
"svc = SVC()\n",
"svc.fit(X_train_pca, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With our trained data, we can predict the results for the test dataset and compare it to the test targets"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_test_pca = pca.transform(X_test)\n",
"y = svc.predict(X_test_pca)\n",
"\n",
"print(confusion_matrix(y_test, y))\n",
"print(classification_report(y_test, y))\n",
"print(accuracy_score(y_test, y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question 4\n",
"\n",
"The SVM was trained with default hyperparameters.\n",
"\n",
"Theses parameters are the following:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(svc.C)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question 5\n",
"\n",
"We use a gridsearchCV to perform a search on the hyperparameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import GridSearchCV\n",
"\n",
"parameters = {'C': np.logspace(-2, 3, 10), 'gamma': np.logspace(-4, 1, 10)}\n",
"\n",
"clf = GridSearchCV(svc, parameters)\n",
"print(clf)\n",
"\n",
"clf.fit(X_train_pca, y_train)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"clf.best_params_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With these parameters :"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"svc = SVC(C=clf.best_params_['C'], gamma=clf.best_params_['gamma'])\n",
"svc.fit(X_train_pca, y_train)\n",
"\n",
"y = svc.predict(X_test_pca)\n",
"\n",
"print(classification_report(y_test, y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Question 6\n",
"\n",
"We want to add the number of principall components used to the CV"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.pipeline import make_pipeline\n",
"\n",
"pca = PCA(whiten=True)\n",
"svc = SVC()\n",
"\n",
"estimator = make_pipeline(pca, svc)\n",
"\n",
"parameters = {\n",
" 'pca__n_components': range(100),\n",
" 'svc__C' : np.logspace(-2, 3, 10),\n",
" 'svc__gamma' : np.logspace(-4, 1, 10)\n",
"}\n",
"\n",
"clf = GridSearchCV(estimator, parameters)\n",
"clf.fit(X_train_pca, y_train)\n",
"clf.best_params_"
]
}
],
"metadata": {
"interpreter": {
"hash": "3abb0a1ef4892304d86bb3a3dfd052bcca35057beadba016173999c775e8d3ba"
},
"kernelspec": {
"display_name": "Python 3.9.7 64-bit ('AOS1-QteoCFsS': pipenv)",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment