"First, we will generate highly correlated data, containing a sample X (multidim) and a target y (one dim).\n",

"\n",

"#### Data generation first function: generate_data\n",

"We write a function for this.\n",

"Its parameters are :\n",

"- n_samples the number of samples\n",

...

...

@@ -58,7 +58,25 @@

"- For the other dimensions of X, noted i, the value will be calculated as follow :\n",

" - We generate a number from a normal law N(i / 2, 1)\n",

" - We add it to the value of the first column\n",

"- For Y, we select 2 over 3 values of X and we sum them"

"- For Y, we select 2 over 3 values of X and we sum them\n",

"\n",

"#### Data generation second function: generate_data_2\n",

"We have written a second function, which generate another highly correlated data set in order to compare our results.\n",

"Its parameters are\n",

"- n_samples the number of samples \n",

"- n_features the number of features in X\n",

"and the outputs X and y\n",

"\n",

"For this purpose, we proceed in 4 steps:\n",

"\n",

"- we generate samples of a geometric law of parameter p = 0.5, these samples are stored in the first column of X\n",

"- for the other columns of X we do\n",

" - we generate randomly a parameter p between 0 and 1\n",

" - we generate samples of a geometric law of parameter p\n",

" - we add this samples to the sum of the previous column\n",

" \n",

"At the end, we have the matrix X where each column `Xi` is a sum of a samples generated from a geometric law and the previous columns `X0+...+Xi-1`.\n",

"We generate `y` as the mean of `X` on the axis 1."