TP_Regression.ipynb 6.25 KB
Newer Older
 TheophilePACE committed Jan 22, 2019 1 2 3 4 5 6 ``````{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ `````` Long Le committed Jan 22, 2019 7 8 `````` "# TP Apprentissage supervisé: Régression\n", "Dans ce TP, on va faire la regression. C'est pour analyser la relation d'une variable par rapport à une ou plusieurs autres." `````` TheophilePACE committed Jan 22, 2019 9 10 11 12 13 14 15 16 17 18 19 20 21 `````` ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ `````` Long Le committed Jan 22, 2019 22 23 24 25 26 27 28 `````` "On va utiliser les données Boston.\n", "https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html\n", "\n", "Prix des maisons à Boston (cf le site pour les variables)\n", "https://scikit-learn.org/stable/datasets/index.html#boston-dataset\n", "\n", "Importez les libraries de ce matin: `numpy` et `scikit datasets`.\n", `````` TheophilePACE committed Jan 22, 2019 29 `````` "Consultation de la doc du dataset\n", `````` Long Le committed Jan 22, 2019 30 `````` "\n", `````` TheophilePACE committed Jan 22, 2019 31 32 33 `````` "Chargement du dataset boston" ] }, `````` Long Le committed Jan 22, 2019 34 35 `````` { "cell_type": "code", `````` TheophilePACE committed Jan 22, 2019 36 `````` "execution_count": 15, `````` Long Le committed Jan 22, 2019 37 38 `````` "metadata": {}, "outputs": [], `````` TheophilePACE committed Jan 22, 2019 39 40 41 42 43 44 45 `````` "source": [ "import numpy as np \n", "from sklearn import datasets\n", "boston = datasets.load_boston()\n", "X, y = boston.data, boston.target\n", "feature_names = boston.feature_names" ] `````` Long Le committed Jan 22, 2019 46 `````` }, `````` TheophilePACE committed Jan 22, 2019 47 48 49 50 51 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyse exploratoire et préparation du dataset\n", `````` Long Le committed Jan 22, 2019 52 `````` "Étudier les corrélations en utilisant `np.corrcoef`" `````` TheophilePACE committed Jan 22, 2019 53 54 55 `````` ] }, { `````` Long Le committed Jan 22, 2019 56 57 `````` "cell_type": "code", "execution_count": null, `````` TheophilePACE committed Jan 22, 2019 58 `````` "metadata": {}, `````` Long Le committed Jan 22, 2019 59 60 `````` "outputs": [], "source": [] `````` TheophilePACE committed Jan 22, 2019 61 62 63 64 65 `````` }, { "cell_type": "markdown", "metadata": {}, "source": [ `````` Long Le committed Jan 22, 2019 66 67 68 `````` "Split du dataset boston\n", "\n", "Pour cela, utilisez la fonction scikit-learn `sklearn.model_selection.train_test_split`. Importez cette méthode, " `````` TheophilePACE committed Jan 22, 2019 69 70 `````` ] }, `````` Long Le committed Jan 22, 2019 71 72 73 74 75 76 77 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 78 79 80 81 82 83 84 85 86 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear regression\n", "Modèle classique, assez peu puissant et interprétable. Basée sur la Mean Square Error. Très sensible au outliers.\n", "Trouver le modèle sur scikit learn." ] }, `````` Long Le committed Jan 22, 2019 87 88 89 90 91 92 93 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 94 95 96 97 98 99 100 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "Run sur boston. afficher les coef de chaque features. Quelles features sont significative?" ] }, `````` Long Le committed Jan 22, 2019 101 102 103 104 105 106 107 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 108 109 110 111 112 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "## Arbre de régression\n", `````` TheophilePACE committed Jan 22, 2019 113 114 115 `````` "![](https://i0.wp.com/freakonometrics.hypotheses.org/files/2015/06/boosting-algo-3.gif?zoom=2&w=456&ssl=1)\n", "\n", "Les arbres de régression sont des modèles très puissants, qui divisent l'espace en zone où tout les points ont le même output. On trouvera dans scikit: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html" `````` TheophilePACE committed Jan 22, 2019 116 117 `````` ] }, `````` Long Le committed Jan 22, 2019 118 119 `````` { "cell_type": "code", `````` TheophilePACE committed Jan 22, 2019 120 `````` "execution_count": 13, `````` Long Le committed Jan 22, 2019 121 122 `````` "metadata": {}, "outputs": [], `````` TheophilePACE committed Jan 22, 2019 123 124 125 126 `````` "source": [ "import matplotlib.gridspec as gridspec\n", "from matplotlib import pyplot as plt" ] `````` Long Le committed Jan 22, 2019 127 `````` }, `````` TheophilePACE committed Jan 22, 2019 128 129 130 131 132 133 134 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "Essayer avec une profondeur max de 3" ] }, `````` Long Le committed Jan 22, 2019 135 136 137 138 139 140 141 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 142 143 144 145 146 147 148 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "Essayer avec une profondeur max de 5" ] }, `````` Long Le committed Jan 22, 2019 149 150 151 152 153 154 155 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 156 157 158 159 160 161 162 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "Essayer avec une profondeur max de 10" ] }, `````` Long Le committed Jan 22, 2019 163 164 165 166 167 168 169 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 170 171 172 173 174 175 176 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "Comparer les résultats" ] }, `````` Long Le committed Jan 22, 2019 177 178 179 180 181 182 183 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 184 185 186 187 188 189 190 191 192 193 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "## Random forest\n", "Trouver sur scikit\n", "image\n", "modèle" ] }, `````` Long Le committed Jan 22, 2019 194 195 196 197 198 199 200 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 201 202 203 204 205 206 207 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "Essayer avec 3 arbres" ] }, `````` Long Le committed Jan 22, 2019 208 209 210 211 212 213 214 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 215 216 217 218 219 220 221 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "Essayer avec 10 arbres" ] }, `````` Long Le committed Jan 22, 2019 222 223 224 225 226 227 228 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 229 230 231 232 233 234 235 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "100 arbres" ] }, `````` Long Le committed Jan 22, 2019 236 237 238 239 240 241 242 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 243 244 245 246 247 248 249 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "Comparer avec les arbres de régression. Quels sont les avantages?" ] }, `````` Long Le committed Jan 22, 2019 250 251 252 253 254 255 256 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 257 258 259 260 261 262 263 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "_optionel_ Tracer le résultat avec 1 arbre, 3 arbres et 100 arbres " ] }, `````` Long Le committed Jan 22, 2019 264 265 266 267 268 269 270 `````` { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, `````` TheophilePACE committed Jan 22, 2019 271 272 273 274 275 276 277 278 279 `````` { "cell_type": "markdown", "metadata": {}, "source": [ "## Si vous vous ennuyez\n", "Comparer les différents modèles, en lançant tout ça su le test\n", "\n", "Faire une régression sur le résultat d'une PCA (touchy)\n" ] `````` Long Le committed Jan 22, 2019 280 281 282 283 284 285 286 `````` }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] `````` TheophilePACE committed Jan 22, 2019 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 `````` } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", `````` TheophilePACE committed Jan 22, 2019 305 `````` "version": "3.6.6" `````` TheophilePACE committed Jan 22, 2019 306 307 308 309 310 `````` } }, "nbformat": 4, "nbformat_minor": 2 }``````