Commit d923e6b2 authored by Sylvain Marchienne's avatar Sylvain Marchienne

Merge branch 'TP_python' into 'master'

TP Python

See merge request !1
parents 81d1a4e8 fcaa91c3
# TP1 Lundi 21/01/2019
Aujourd'hui vous allez prendre en mains Python et quelques unes des libraires:
* Numpy
* Matplotlib
* Scikit-Learn
Commencez par le notebook `python-numpy-matplotlib.ipynb` (~1h pas plus) puis les notebooks dans le dossier `machine learning` en commençant par `05.00-Machine-Learning.ipynb`.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deuxième partie, introduction au Machine Learning\n",
"Les notebooks qui suivent sont issu d'un livre \"Python Data Science Handbook\" ( https://github.com/jakevdp/PythonDataScienceHandbook ). L'auteur est l'un des plus grands contributeurs au projet Pandas, que vous aurez l'occasion d'utiliser pendant votre projet. Son livre est une référence et il propose de nombeux notebooks sur son Github pour apprendre. Nous vous en avons sélectionné quelques uns pour introduire le Machine Learning. Ne vous inquiétez pas si vous ne comprenez pas tout, vous aurez le temps de jouer avec scikit-learn et de mieux comprendre demain avec Sylvain Rousseau.\n",
"\n",
"Suivez le notebook en prenant soin de comprendre les explications et le code (quand il y en a). Si vous avez besoin d'explications, n'hésitez pas à parler aux tuteurs et/ou poser vos questions sur slack ! Les tuteurs ne sont pas forcément experts et ne pourront pas répondre à toutes vos questions, mais lancer la réflexion avec eux et les autres étudiants est bénéfique !"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In many ways, machine learning is the primary means by which data science manifests itself to the broader world.\n",
"Machine learning is where these computational and algorithmic skills of data science meet the statistical thinking of data science, and the result is a collection of approaches to inference and data exploration that are not about effective theory so much as effective computation.\n",
"\n",
"The term \"machine learning\" is sometimes thrown around as if it is some kind of magic pill: *apply machine learning to your data, and all your problems will be solved!*\n",
"As you might expect, the reality is rarely this simple.\n",
"While these methods can be incredibly powerful, to be effective they must be approached with a firm grasp of the strengths and weaknesses of each method, as well as a grasp of general concepts such as bias and variance, overfitting and underfitting, and more.\n",
"\n",
"This chapter will dive into practical aspects of machine learning, primarily using Python's [Scikit-Learn](http://scikit-learn.org) package.\n",
"This is not meant to be a comprehensive introduction to the field of machine learning; that is a large subject and necessitates a more technical approach than we take here. Rather, the goals of this chapter are:\n",
"\n",
"- To introduce the fundamental vocabulary and concepts of machine learning.\n",
"- To introduce the Scikit-Learn API and show some examples of its use.\n",
"- To take a deeper dive into the details of several of the most important machine learning approaches, and develop an intuition into how they work and when and where they are applicable."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Allez au prochain notebook** \"05.01-What-Is-Machine-Learning\""
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"navigate_num": "#000000",
"navigate_text": "#333333",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700",
"sidebar_border": "#EEEEEE",
"wrapper_background": "#FFFFFF"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "48px",
"width": "252px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false,
"widenNotebook": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
order,name,height(cm)
1,George Washington,189
2,John Adams,170
3,Thomas Jefferson,189
4,James Madison,163
5,James Monroe,183
6,John Quincy Adams,171
7,Andrew Jackson,185
8,Martin Van Buren,168
9,William Henry Harrison,173
10,John Tyler,183
11,James K. Polk,173
12,Zachary Taylor,173
13,Millard Fillmore,175
14,Franklin Pierce,178
15,James Buchanan,183
16,Abraham Lincoln,193
17,Andrew Johnson,178
18,Ulysses S. Grant,173
19,Rutherford B. Hayes,174
20,James A. Garfield,183
21,Chester A. Arthur,183
23,Benjamin Harrison,168
25,William McKinley,170
26,Theodore Roosevelt,178
27,William Howard Taft,182
28,Woodrow Wilson,180
29,Warren G. Harding,183
30,Calvin Coolidge,178
31,Herbert Hoover,182
32,Franklin D. Roosevelt,188
33,Harry S. Truman,175
34,Dwight D. Eisenhower,179
35,John F. Kennedy,183
36,Lyndon B. Johnson,193
37,Richard Nixon,182
38,Gerald Ford,183
39,Jimmy Carter,177
40,Ronald Reagan,185
41,George H. W. Bush,188
42,Bill Clinton,188
43,George W. Bush,182
44,Barack Obama,185
"state","abbreviation"
"Alabama","AL"
"Alaska","AK"
"Arizona","AZ"
"Arkansas","AR"
"California","CA"
"Colorado","CO"
"Connecticut","CT"
"Delaware","DE"
"District of Columbia","DC"
"Florida","FL"
"Georgia","GA"
"Hawaii","HI"
"Idaho","ID"
"Illinois","IL"
"Indiana","IN"
"Iowa","IA"
"Kansas","KS"
"Kentucky","KY"
"Louisiana","LA"
"Maine","ME"
"Montana","MT"
"Nebraska","NE"
"Nevada","NV"
"New Hampshire","NH"
"New Jersey","NJ"
"New Mexico","NM"
"New York","NY"
"North Carolina","NC"
"North Dakota","ND"
"Ohio","OH"
"Oklahoma","OK"
"Oregon","OR"
"Maryland","MD"
"Massachusetts","MA"
"Michigan","MI"
"Minnesota","MN"
"Mississippi","MS"
"Missouri","MO"
"Pennsylvania","PA"
"Rhode Island","RI"
"South Carolina","SC"
"South Dakota","SD"
"Tennessee","TN"
"Texas","TX"
"Utah","UT"
"Vermont","VT"
"Virginia","VA"
"Washington","WA"
"West Virginia","WV"
"Wisconsin","WI"
"Wyoming","WY"
\ No newline at end of file
state,area (sq. mi)
Alabama,52423
Alaska,656425
Arizona,114006
Arkansas,53182
California,163707
Colorado,104100
Connecticut,5544
Delaware,1954
Florida,65758
Georgia,59441
Hawaii,10932
Idaho,83574
Illinois,57918
Indiana,36420
Iowa,56276
Kansas,82282
Kentucky,40411
Louisiana,51843
Maine,35387
Maryland,12407
Massachusetts,10555
Michigan,96810
Minnesota,86943
Mississippi,48434
Missouri,69709
Montana,147046
Nebraska,77358
Nevada,110567
New Hampshire,9351
New Jersey,8722
New Mexico,121593
New York,54475
North Carolina,53821
North Dakota,70704
Ohio,44828
Oklahoma,69903
Oregon,98386
Pennsylvania,46058
Rhode Island,1545
South Carolina,32007
South Dakota,77121
Tennessee,42146
Texas,268601
Utah,84904
Vermont,9615
Virginia,42769
Washington,71303
West Virginia,24231
Wisconsin,65503
Wyoming,97818
District of Columbia,68
Puerto Rico,3515
This diff is collapsed.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from ipywidgets import interact
def visualize_tree(estimator, X, y, boundaries=True,
xlim=None, ylim=None, ax=None):
ax = ax or plt.gca()
# Plot the training points
ax.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap='viridis',
clim=(y.min(), y.max()), zorder=3)
ax.axis('tight')
ax.axis('off')
if xlim is None:
xlim = ax.get_xlim()
if ylim is None:
ylim = ax.get_ylim()
# fit the estimator
estimator.fit(X, y)
xx, yy = np.meshgrid(np.linspace(*xlim, num=200),
np.linspace(*ylim, num=200))
Z = estimator.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
n_classes = len(np.unique(y))
Z = Z.reshape(xx.shape)
contours = ax.contourf(xx, yy, Z, alpha=0.3,
levels=np.arange(n_classes + 1) - 0.5,
cmap='viridis', clim=(y.min(), y.max()),
zorder=1)
ax.set(xlim=xlim, ylim=ylim)
# Plot the decision boundaries
def plot_boundaries(i, xlim, ylim):
if i >= 0:
tree = estimator.tree_
if tree.feature[i] == 0:
ax.plot([tree.threshold[i], tree.threshold[i]], ylim, '-k', zorder=2)
plot_boundaries(tree.children_left[i],
[xlim[0], tree.threshold[i]], ylim)
plot_boundaries(tree.children_right[i],
[tree.threshold[i], xlim[1]], ylim)
elif tree.feature[i] == 1:
ax.plot(xlim, [tree.threshold[i], tree.threshold[i]], '-k', zorder=2)
plot_boundaries(tree.children_left[i], xlim,
[ylim[0], tree.threshold[i]])
plot_boundaries(tree.children_right[i], xlim,
[tree.threshold[i], ylim[1]])
if boundaries:
plot_boundaries(0, xlim, ylim)
def plot_tree_interactive(X, y):
def interactive_tree(depth=5):
clf = DecisionTreeClassifier(max_depth=depth, random_state=0)
visualize_tree(clf, X, y)
return interact(interactive_tree, depth=[1, 5])
def randomized_tree_interactive(X, y):
N = int(0.75 * X.shape[0])
xlim = (X[:, 0].min(), X[:, 0].max())
ylim = (X[:, 1].min(), X[:, 1].max())
def fit_randomized_tree(random_state=0):
clf = DecisionTreeClassifier(max_depth=15)
i = np.arange(len(y))
rng = np.random.RandomState(random_state)
rng.shuffle(i)
visualize_tree(clf, X[i[:N]], y[i[:N]], boundaries=False,
xlim=xlim, ylim=ylim)
interact(fit_randomized_tree, random_state=[0, 100]);
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment