Commit ed9386e5 by Rémy Huet 💻

### First work, read description !

- Some introduction and imports
- Generation of highly correlated data
- Training of a lasso regression on this data

Warning !! Data is maybe to-highly correlated: only one or two features are selected.
TODO test changes to the data to have « better bad » results with Lasso
parent 9074a236
 { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# AOS1 assignment\n", "## Make elastic net outshine the Lasso\n", "\n", "authors : Mathilde Rineau, Rémy Huet\n", "\n", "### Introduction\n", "\n", "The aim of this work is to demonstrate experimentally that the elastic net regularization outshine the Lasso regulation in some cases.\n", "\n", "We know that the Lasso regularization may be unstable when used on high-correlated data.\n", "Indeed, the Lasso regularization may lead to ignore some features (by setting their weight in the regression to 0).\n", "When the data is highly correlated, small changes on the sample could lead to changes in the selection of the features (what whe call instability).\n", "\n", "At the opposite, elastic net regression should be able to ignore some features but with more stability than Lasso.\n", "\n", "In this work, we will construct a dataset with highly correlated data to demonstrate that." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "from sklearn.linear_model import Lasso, ElasticNet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data generation\n", "\n", "First, we will generate highly correlated data, containing a sample X (multidim) and a target y (one dim).\n", "\n", "We write a function for this.\n", "Its parameters are :\n", "- n_samples the number of samples\n", "- n_features the number of features in X\n", "- m, s the parameters of the normal law used for the generation of the first feature\n", "and the outputs X and y\n", "\n", "For this purpose, we will proceed in X steps :\n", "\n", "- First, we will generate the first dimension of X randomly from a normal law (m, s)\n", "- For the other dimensions of X, the value will be calculated as follow :\n", " - We generate a number from a normal law N(0, 1)\n", " - We add it to the value of the first column\n", "- For Y, the value is calculated as the mean of the values we generated for X" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# /!\\ THIS IS A FIRST TEST VERSION, COULD (AND WILL CERTAINLY) CHANGE\n", "def generate_data(n_samples, n_features, m, s):\n", " X = np.ndarray((n_samples, n_features))\n", " y = np.ndarray((n_samples,))\n", "\n", " for i in range(n_samples):\n", " X[i, 0] = np.random.normal(m, s)\n", " for j in range(1, n_features):\n", " X[i, j] = X[i, 0] + np.random.normal(1, 0)\n", " \n", " y = np.mean(X, axis=1)\n", "\n", " return X, y\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Demonstrate instability of Lasso" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO\n", "\n", "########## TMP TESTS ##########\n", "X, y = generate_data(1000, 50, 30, 3)\n", "model = Lasso(alpha=1.0)\n", "model.fit(X, y)\n", "model.coef_\n", "###############################" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Demonstrate stability of elastic net" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] } ], "metadata": { "interpreter": { "hash": "3abb0a1ef4892304d86bb3a3dfd052bcca35057beadba016173999c775e8d3ba" }, "kernelspec": { "display_name": "Python 3.9.7 64-bit ('AOS1-QteoCFsS': pipenv)", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }
 %% Cell type:markdown id: tags: # AOS1 assignment ## Make elastic net outshine the Lasso authors : Mathilde Rineau, Rémy Huet ### Introduction The aim of this work is to demonstrate experimentally that the elastic net regularization outshine the Lasso regulation in some cases. We know that the Lasso regularization may be unstable when used on high-correlated data. Indeed, the Lasso regularization may lead to ignore some features (by setting their weight in the regression to 0). When the data is highly correlated, small changes on the sample could lead to changes in the selection of the features (what whe call instability). At the opposite, elastic net regression should be able to ignore some features but with more stability than Lasso. In this work, we will construct a dataset with highly correlated data to demonstrate that. %% Cell type:code id: tags: ``` import numpy as np from sklearn.linear_model import Lasso, ElasticNet ``` %% Cell type:markdown id: tags: ### Data generation First, we will generate highly correlated data, containing a sample X (multidim) and a target y (one dim). We write a function for this. Its parameters are : - n_samples the number of samples - n_features the number of features in X - m, s the parameters of the normal law used for the generation of the first feature and the outputs X and y For this purpose, we will proceed in X steps : - First, we will generate the first dimension of X randomly from a normal law (m, s) - For the other dimensions of X, the value will be calculated as follow : - We generate a number from a normal law N(0, 1) - We add it to the value of the first column - For Y, the value is calculated as the mean of the values we generated for X %% Cell type:code id: tags: ``` # /!\ THIS IS A FIRST TEST VERSION, COULD (AND WILL CERTAINLY) CHANGE def generate_data(n_samples, n_features, m, s): X = np.ndarray((n_samples, n_features)) y = np.ndarray((n_samples,)) for i in range(n_samples): X[i, 0] = np.random.normal(m, s) for j in range(1, n_features): X[i, j] = X[i, 0] + np.random.normal(1, 0) y = np.mean(X, axis=1) return X, y ``` %% Cell type:markdown id: tags: ### Demonstrate instability of Lasso %% Cell type:code id: tags: ``` # TODO ########## TMP TESTS ########## X, y = generate_data(1000, 50, 30, 3) model = Lasso(alpha=1.0) model.fit(X, y) model.coef_ ############################### ``` %% Cell type:markdown id: tags: ### Demonstrate stability of elastic net %% Cell type:code id: tags: ``` # TODO ```
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment