...
 
Commits (24)
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TP Apprentissage supervisé: Régression\n",
"Dans ce TP, on va faire la regression. C'est pour analyser la relation d'une variable par rapport à une ou plusieurs autres."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On va utiliser les données Boston.\n",
"https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html\n",
"\n",
"Prix des maisons à Boston (cf le site pour les variables)\n",
"https://scikit-learn.org/stable/datasets/index.html#boston-dataset\n",
"\n",
"Importez les libraries de ce matin: `numpy` et `scikit datasets`.\n",
"Consultation de la doc du dataset\n",
"\n",
"Chargement du dataset boston"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyse exploratoire et préparation du dataset\n",
"Étudier les corrélations en utilisant `np.corrcoef`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Split du dataset boston\n",
"\n",
"Pour cela, utilisez la fonction scikit-learn `sklearn.model_selection.train_test_split`. Importez cette méthode, "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Linear regression\n",
"Modèle classique, assez peu puissant et interprétable. Basée sur la Mean Square Error. Très sensible au outliers.\n",
"Trouver le modèle sur scikit learn."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run sur boston. afficher les coef de chaque features. Quelles features sont significative?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Arbre de régression\n",
"![](https://fr.wikipedia.org/wiki/Arbre_de_d%C3%A9cision#/media/File:Arbre_de_decision.jpg)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Essayer avec une profondeur max de 3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Essayer avec une profondeur max de 5"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Essayer avec une profondeur max de 10"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparer les résultats"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Random forest\n",
"Trouver sur scikit\n",
"image\n",
"modèle"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Essayer avec 3 arbres"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Essayer avec 10 arbres"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"100 arbres"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparer avec les arbres de régression. Quels sont les avantages?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_optionel_ Tracer le résultat avec 1 arbre, 3 arbres et 100 arbres "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Si vous vous ennuyez\n",
"Comparer les différents modèles, en lançant tout ça su le test\n",
"\n",
"Faire une régression sur le résultat d'une PCA (touchy)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
This diff is collapsed.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.