TP_Regression.ipynb 6.25 KB
Newer Older
TheophilePACE's avatar
TheophilePACE committed
1 2 3 4 5 6
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Long Le's avatar
Long Le committed
7 8
    "# TP Apprentissage supervisé: Régression\n",
    "Dans ce TP, on va faire la regression. C'est pour analyser la relation d'une variable par rapport à une ou plusieurs autres."
TheophilePACE's avatar
TheophilePACE committed
9 10 11 12 13 14 15 16 17 18 19 20 21
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Long Le's avatar
Long Le committed
22 23 24 25 26 27 28
    "On va utiliser les données Boston.\n",
    "https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html\n",
    "\n",
    "Prix des maisons à Boston (cf le site pour les variables)\n",
    "https://scikit-learn.org/stable/datasets/index.html#boston-dataset\n",
    "\n",
    "Importez les libraries de ce matin: `numpy` et `scikit datasets`.\n",
TheophilePACE's avatar
TheophilePACE committed
29
    "Consultation de la doc du dataset\n",
Long Le's avatar
Long Le committed
30
    "\n",
TheophilePACE's avatar
TheophilePACE committed
31 32 33
    "Chargement du dataset boston"
   ]
  },
Long Le's avatar
Long Le committed
34 35
  {
   "cell_type": "code",
TheophilePACE's avatar
TheophilePACE committed
36
   "execution_count": 15,
Long Le's avatar
Long Le committed
37 38
   "metadata": {},
   "outputs": [],
TheophilePACE's avatar
TheophilePACE committed
39 40 41 42 43 44 45
   "source": [
    "import numpy as np \n",
    "from sklearn import datasets\n",
    "boston = datasets.load_boston()\n",
    "X, y = boston.data, boston.target\n",
    "feature_names = boston.feature_names"
   ]
Long Le's avatar
Long Le committed
46
  },
TheophilePACE's avatar
TheophilePACE committed
47 48 49 50 51
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analyse exploratoire et préparation du dataset\n",
Long Le's avatar
Long Le committed
52
    "Étudier les corrélations en utilisant `np.corrcoef`"
TheophilePACE's avatar
TheophilePACE committed
53 54 55
   ]
  },
  {
Long Le's avatar
Long Le committed
56 57
   "cell_type": "code",
   "execution_count": null,
TheophilePACE's avatar
TheophilePACE committed
58
   "metadata": {},
Long Le's avatar
Long Le committed
59 60
   "outputs": [],
   "source": []
TheophilePACE's avatar
TheophilePACE committed
61 62 63 64 65
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Long Le's avatar
Long Le committed
66 67 68
    "Split du dataset boston\n",
    "\n",
    "Pour cela, utilisez la fonction scikit-learn `sklearn.model_selection.train_test_split`. Importez cette méthode, "
TheophilePACE's avatar
TheophilePACE committed
69 70
   ]
  },
Long Le's avatar
Long Le committed
71 72 73 74 75 76 77
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
78 79 80 81 82 83 84 85 86
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear regression\n",
    "Modèle classique, assez peu puissant et interprétable. Basée sur la Mean Square Error. Très sensible au outliers.\n",
    "Trouver le modèle sur scikit learn."
   ]
  },
Long Le's avatar
Long Le committed
87 88 89 90 91 92 93
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
94 95 96 97 98 99 100
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run sur boston. afficher les coef de chaque features. Quelles features sont significative?"
   ]
  },
Long Le's avatar
Long Le committed
101 102 103 104 105 106 107
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
108 109 110 111 112
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Arbre de régression\n",
TheophilePACE's avatar
TheophilePACE committed
113 114 115
    "![](https://i0.wp.com/freakonometrics.hypotheses.org/files/2015/06/boosting-algo-3.gif?zoom=2&w=456&ssl=1)\n",
    "\n",
    "Les arbres de régression sont des modèles très puissants, qui divisent l'espace en zone où tout les points ont le même output. On trouvera dans scikit: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html"
TheophilePACE's avatar
TheophilePACE committed
116 117
   ]
  },
Long Le's avatar
Long Le committed
118 119
  {
   "cell_type": "code",
TheophilePACE's avatar
TheophilePACE committed
120
   "execution_count": 13,
Long Le's avatar
Long Le committed
121 122
   "metadata": {},
   "outputs": [],
TheophilePACE's avatar
TheophilePACE committed
123 124 125 126
   "source": [
    "import matplotlib.gridspec as gridspec\n",
    "from matplotlib import pyplot as plt"
   ]
Long Le's avatar
Long Le committed
127
  },
TheophilePACE's avatar
TheophilePACE committed
128 129 130 131 132 133 134
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Essayer avec une profondeur max de 3"
   ]
  },
Long Le's avatar
Long Le committed
135 136 137 138 139 140 141
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
142 143 144 145 146 147 148
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Essayer avec une profondeur max de 5"
   ]
  },
Long Le's avatar
Long Le committed
149 150 151 152 153 154 155
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
156 157 158 159 160 161 162
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Essayer avec une profondeur max de 10"
   ]
  },
Long Le's avatar
Long Le committed
163 164 165 166 167 168 169
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
170 171 172 173 174 175 176
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparer les résultats"
   ]
  },
Long Le's avatar
Long Le committed
177 178 179 180 181 182 183
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
184 185 186 187 188 189 190 191 192 193
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random forest\n",
    "Trouver sur scikit\n",
    "image\n",
    "modèle"
   ]
  },
Long Le's avatar
Long Le committed
194 195 196 197 198 199 200
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
201 202 203 204 205 206 207
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Essayer avec 3 arbres"
   ]
  },
Long Le's avatar
Long Le committed
208 209 210 211 212 213 214
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
215 216 217 218 219 220 221
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Essayer avec 10 arbres"
   ]
  },
Long Le's avatar
Long Le committed
222 223 224 225 226 227 228
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
229 230 231 232 233 234 235
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "100 arbres"
   ]
  },
Long Le's avatar
Long Le committed
236 237 238 239 240 241 242
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
243 244 245 246 247 248 249
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparer avec les arbres de régression. Quels sont les avantages?"
   ]
  },
Long Le's avatar
Long Le committed
250 251 252 253 254 255 256
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
257 258 259 260 261 262 263
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_optionel_ Tracer le résultat avec 1 arbre, 3 arbres et 100 arbres "
   ]
  },
Long Le's avatar
Long Le committed
264 265 266 267 268 269 270
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
TheophilePACE's avatar
TheophilePACE committed
271 272 273 274 275 276 277 278 279
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Si vous vous ennuyez\n",
    "Comparer les différents modèles, en lançant tout ça su le test\n",
    "\n",
    "Faire une régression sur le résultat d'une PCA (touchy)\n"
   ]
Long Le's avatar
Long Le committed
280 281 282 283 284 285 286
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
TheophilePACE's avatar
TheophilePACE committed
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
TheophilePACE's avatar
TheophilePACE committed
305
   "version": "3.6.6"
TheophilePACE's avatar
TheophilePACE committed
306 307 308 309 310
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}