Commit 015beae0 authored by Mathilde Rineau's avatar Mathilde Rineau 🙂
Browse files

Add new file

parent 20ab0835
{
"cells": [
{
"cell_type": "markdown",
"id": "b04d6b74",
"metadata": {},
"source": [
"# ASO1 Problem\n",
"\n",
"Authors: Remy Huet, Mathilde Rineau\n",
"\n",
"Date 24/10/2021\n"
]
},
{
"cell_type": "markdown",
"id": "ce33a2fc",
"metadata": {},
"source": [
"Subject:\n",
"We have the monthly retail debit card usage in Iceland (million ISK) from january 2000 to december 2012.\n",
"We want to estimate the cumulated debit card usage during the 4 first months of 2013."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd017ee6",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f8d9f85",
"metadata": {},
"outputs": [],
"source": [
"# reading csv file\n",
"ts = pd.read_csv(\"debitcards.csv\", index_col = 0,parse_dates=True)\n",
"print(ts)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3686021",
"metadata": {},
"outputs": [],
"source": [
"# verification on the data\n",
"assert(ts.shape == (156, 1))\n",
"assert(type(ts.index) is pd.core.indexes.datetimes.DatetimeIndex)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c5271112",
"metadata": {},
"outputs": [],
"source": [
"# MS: month start frequency\n",
"ts.index.freq = \"MS\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "919229a4",
"metadata": {},
"outputs": [],
"source": [
"plt.plot(ts.V1)"
]
},
{
"cell_type": "markdown",
"id": "615221ca",
"metadata": {},
"source": [
"By plotting the data, we can see that the expectancy and the standard deviation do not seem to be constant so the time series is probably not stationary.\n",
"But, we perform a augmented Dickey-Fuller test to decide if it is or not a stationary time series."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "61b00901",
"metadata": {},
"outputs": [],
"source": [
"from statsmodels.tsa.stattools import adfuller\n",
"#perform augmented Dickey-Fuller test\n",
"test = adfuller(ts.V1, autolag='AIC')\n",
"pvalue = test[1]\n",
"print(pvalue)"
]
},
{
"cell_type": "markdown",
"id": "3c24d32d",
"metadata": {},
"source": [
"The p-value is approximately 0.78, so much higher than 0.005 so we accept the hypothesis that the data is stationary."
]
},
{
"cell_type": "markdown",
"id": "e19bf18c",
"metadata": {},
"source": [
"We use a Box-Cox transformation in order to obtain stationary time series"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edd8dd63",
"metadata": {},
"outputs": [],
"source": [
"from scipy import stats\n",
"# Box-Cox transformation\n",
"ts_V1_transform, ts_V1_lambda = stats.boxcox(ts.V1)\n",
"plt.plot(ts_V1_transform)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d46d9cc8",
"metadata": {},
"outputs": [],
"source": [
"#perform augmented Dickey-Fuller test\n",
"test = adfuller(ts_V1_transform, autolag='AIC')\n",
"pvalue = test[1]\n",
"print(pvalue)"
]
},
{
"cell_type": "markdown",
"id": "cb24ff00",
"metadata": {},
"source": [
"The p-value is approximately 0.71, so much higher than 0.005 so we accept the hypothesis that the data is stationary."
]
},
{
"cell_type": "markdown",
"id": "25f27bd7",
"metadata": {},
"source": [
"The Box-Cox transformation failed to give us a stationary time series."
]
},
{
"cell_type": "markdown",
"id": "72295080",
"metadata": {},
"source": [
"We perform an integrated model in order to obtain a stationary series (Yt - Yt-1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a6851e0d",
"metadata": {},
"outputs": [],
"source": [
"ts_stationary = ts.copy()\n",
"j=1\n",
"for i in (ts.index):\n",
" if (j <155):\n",
" ts_stationary.loc[i]= ts.V1[j+1]- ts.V1[j]\n",
" elif(j==155):\n",
" ts_stationary.loc[i]= ts.V1[154]- ts.V1[153]\n",
" elif(j==156):\n",
" ts_stationary.loc[i]= ts.V1[155]- ts.V1[154]\n",
" else:\n",
" pass\n",
" j+=1\n",
"print(ts_stationary)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c602973d",
"metadata": {},
"outputs": [],
"source": [
"plt.plot(ts_stationary)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc7d373f",
"metadata": {},
"outputs": [],
"source": [
"from statsmodels.tsa.stattools import adfuller\n",
"test = adfuller(ts_stationary, autolag='AIC')\n",
"pvalue = test[1]\n",
"print(pvalue)"
]
},
{
"cell_type": "markdown",
"id": "616ad4c8",
"metadata": {},
"source": [
"The p-value is approximately 0.67, so much higher than 0.005 so we accept the hypothesis that the data is stationary."
]
},
{
"cell_type": "markdown",
"id": "ce066bc0",
"metadata": {},
"source": [
"The integrated model failed to give us a stationary time series."
]
},
{
"cell_type": "markdown",
"id": "bb89b4e9",
"metadata": {},
"source": [
"We change the period of the data in order to reduce the variability"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd01d964",
"metadata": {},
"outputs": [],
"source": [
"ts_quartil = pd.DataFrame(columns=[\"V1\"],index=pd.date_range(pd.Timestamp(\"2000-01-01\"),periods = 39, freq ='4MS'))\n",
"j=0\n",
"for i in ts_quartil.index:\n",
" ts_quartil.loc[i]= ts.V1[j] + ts.V1[j+1] + ts.V1[j+2] + ts.V1[j+3]\n",
" j+=4\n",
"plt.plot(ts_quartil)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82dc234e",
"metadata": {},
"outputs": [],
"source": [
"test = adfuller(ts_quartil, autolag='AIC')\n",
"pvalue = test[1]\n",
"print(pvalue)"
]
},
{
"cell_type": "markdown",
"id": "0b482df4",
"metadata": {},
"source": [
"The p-value is approximately 0.68, so much higher than 0.005 so we accept the hypothesis that the data is stationary."
]
},
{
"cell_type": "markdown",
"id": "1502bd9b",
"metadata": {},
"source": [
"This changement failed to give us a stationary time series."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eadb0062",
"metadata": {},
"outputs": [],
"source": [
"ts_stationary = ts_quartil.copy()\n",
"#print(ts_stationary)\n",
"length = len(ts_quartil)\n",
"j=1\n",
"for i in (ts_quartil.index):\n",
" if (j <length-1):\n",
" #print(i)\n",
" ts_stationary.loc[i]= ts_quartil.V1[j+1]- ts_quartil.V1[j]\n",
" elif(j==length-1):\n",
" #print(i)\n",
" ts_stationary.loc[i]= ts_quartil.V1[37]- ts_quartil.V1[36]\n",
" elif(j==length):\n",
" #print(i)\n",
" ts_stationary.loc[i]= ts_quartil.V1[38]- ts_quartil.V1[37]\n",
" \n",
" else:\n",
" pass\n",
" j+=1\n",
"#print(ts_stationary)\n",
"plt.plot(ts_stationary)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "366acc41",
"metadata": {},
"outputs": [],
"source": [
"ts_stationary_train = ts_stationary[:\"2007-01-01\"]\n",
"ts_stationary_test = ts_stationary[\"2007-01-01\":]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2bd2d5b",
"metadata": {},
"outputs": [],
"source": [
"test = adfuller(ts_stationary_train, autolag='AIC')\n",
"pvalue = test[1]\n",
"print(pvalue)\n",
"test = adfuller(ts_stationary_test, autolag='AIC')\n",
"pvalue = test[1]\n",
"print(pvalue)"
]
},
{
"cell_type": "markdown",
"id": "66abd1a8",
"metadata": {},
"source": [
"If we split the data in two parts we have the first part which seems to be stationary because the p-value of the adfuller test is 0.0 so lower than 0.005.\n",
"But the second part is not stationary because the p-value is almost 1."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4b35ea92",
"metadata": {},
"outputs": [],
"source": [
"print(type(ts_stationary_train))\n",
"plt.plot(ts_stationary_train)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70235c46",
"metadata": {},
"outputs": [],
"source": [
"from statsmodels.graphics.tsaplots import plot_pacf\n",
"from statsmodels.graphics.tsaplots import plot_acf\n",
"plot_acf(ts_stationary_train)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edce5bba",
"metadata": {},
"outputs": [],
"source": [
"#génère une erreur\n",
"plot_pacf(ts_stationary_train)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e111eb2c",
"metadata": {},
"outputs": [],
"source": [
"#génère une erreur\n",
"model = sarimax(ts_stationary_train, order=(1, 0, 0))\n",
"model_fit = model.fit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ad11fc71",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "b0fba332",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "b9b66fb6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
%% Cell type:markdown id:b04d6b74 tags:
# ASO1 Problem
Authors: Remy Huet, Mathilde Rineau
Date 24/10/2021
%% Cell type:markdown id:ce33a2fc tags:
Subject:
We have the monthly retail debit card usage in Iceland (million ISK) from january 2000 to december 2012.
We want to estimate the cumulated debit card usage during the 4 first months of 2013.
%% Cell type:code id:bd017ee6 tags:
``` python
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
```
%% Cell type:code id:9f8d9f85 tags:
``` python
# reading csv file
ts = pd.read_csv("debitcards.csv", index_col = 0,parse_dates=True)
print(ts)
```
%% Cell type:code id:a3686021 tags:
``` python
# verification on the data
assert(ts.shape == (156, 1))
assert(type(ts.index) is pd.core.indexes.datetimes.DatetimeIndex)
```
%% Cell type:code id:c5271112 tags:
``` python
# MS: month start frequency
ts.index.freq = "MS"
```
%% Cell type:code id:919229a4 tags:
``` python
plt.plot(ts.V1)
```
%% Cell type:markdown id:615221ca tags:
By plotting the data, we can see that the expectancy and the standard deviation do not seem to be constant so the time series is probably not stationary.
But, we perform a augmented Dickey-Fuller test to decide if it is or not a stationary time series.
%% Cell type:code id:61b00901 tags:
``` python
from statsmodels.tsa.stattools import adfuller
#perform augmented Dickey-Fuller test
test = adfuller(ts.V1, autolag='AIC')
pvalue = test[1]
print(pvalue)
```
%% Cell type:markdown id:3c24d32d tags:
The p-value is approximately 0.78, so much higher than 0.005 so we accept the hypothesis that the data is stationary.
%% Cell type:markdown id:e19bf18c tags:
We use a Box-Cox transformation in order to obtain stationary time series
%% Cell type:code id:edd8dd63 tags:
``` python
from scipy import stats
# Box-Cox transformation
ts_V1_transform, ts_V1_lambda = stats.boxcox(ts.V1)
plt.plot(ts_V1_transform)
```
%% Cell type:code id:d46d9cc8 tags:
``` python
#perform augmented Dickey-Fuller test
test = adfuller(ts_V1_transform, autolag='AIC')
pvalue = test[1]
print(pvalue)
```
%% Cell type:markdown id:cb24ff00 tags:
The p-value is approximately 0.71, so much higher than 0.005 so we accept the hypothesis that the data is stationary.
%% Cell type:markdown id:25f27bd7 tags:
The Box-Cox transformation failed to give us a stationary time series.
%% Cell type:markdown id:72295080 tags:
We perform an integrated model in order to obtain a stationary series (Yt - Yt-1)
%% Cell type:code id:a6851e0d tags:
``` python
ts_stationary = ts.copy()
j=1
for i in (ts.index):
if (j <155):
ts_stationary.loc[i]= ts.V1[j+1]- ts.V1[j]
elif(j==155):
ts_stationary.loc[i]= ts.V1[154]- ts.V1[153]
elif(j==156):
ts_stationary.loc[i]= ts.V1[155]- ts.V1[154]
else:
pass
j+=1
print(ts_stationary)
```
%% Cell type:code id:c602973d tags:
``` python
plt.plot(ts_stationary)
```
%% Cell type:code id:fc7d373f tags:
``` python
from statsmodels.tsa.stattools import adfuller
test = adfuller(ts_stationary, autolag='AIC')
pvalue = test[1]
print(pvalue)
```
%% Cell type:markdown id:616ad4c8 tags:
The p-value is approximately 0.67, so much higher than 0.005 so we accept the hypothesis that the data is stationary.
%% Cell type:markdown id:ce066bc0 tags:
The integrated model failed to give us a stationary time series.
%% Cell type:markdown id:bb89b4e9 tags:
We change the period of the data in order to reduce the variability
%% Cell type:code id:cd01d964 tags:
``` python
ts_quartil = pd.DataFrame(columns=["V1"],index=pd.date_range(pd.Timestamp("2000-01-01"),periods = 39, freq ='4MS'))
j=0
for i in ts_quartil.index:
ts_quartil.loc[i]= ts.V1[j] + ts.V1[j+1] + ts.V1[j+2] + ts.V1[j+3]
j+=4
plt.plot(ts_quartil)
```
%% Cell type:code id:82dc234e tags:
``` python
test = adfuller(ts_quartil, autolag='AIC')
pvalue = test[1]
print(pvalue)
```
%% Cell type:markdown id:0b482df4 tags:
The p-value is approximately 0.68, so much higher than 0.005 so we accept the hypothesis that the data is stationary.
%% Cell type:markdown id:1502bd9b tags:
This changement failed to give us a stationary time series.
%% Cell type:code id:eadb0062 tags:
``` python
ts_stationary = ts_quartil.copy()
#print(ts_stationary)
length = len(ts_quartil)
j=1
for i in (ts_quartil.index):
if (j <length-1):
#print(i)
ts_stationary.loc[i]= ts_quartil.V1[j+1]- ts_quartil.V1[j]
elif(j==length-1):
#print(i)
ts_stationary.loc[i]= ts_quartil.V1[37]- ts_quartil.V1[36]
elif(j==length):
#print(i)
ts_stationary.loc[i]= ts_quartil.V1[38]- ts_quartil.V1[37]
else:
pass
j+=1
#print(ts_stationary)
plt.plot(ts_stationary)
```
%% Cell type:code id:366acc41 tags:
``` python
ts_stationary_train = ts_stationary[:"2007-01-01"]
ts_stationary_test = ts_stationary["2007-01-01":]
```
%% Cell type:code id:f2bd2d5b tags:
``` python
test = adfuller(ts_stationary_train, autolag='AIC')
pvalue = test[1]
print(pvalue)
test = adfuller(ts_stationary_test, autolag='AIC')
pvalue = test[1]
print(pvalue)
```
%% Cell type:markdown id:66abd1a8 tags:
If we split the data in two parts we have the first part which seems to be stationary because the p-value of the adfuller test is 0.0 so lower than 0.005.
But the second part is not stationary because the p-value is almost 1.
%% Cell type:code id:4b35ea92 tags:
``` python
print(type(ts_stationary_train))
plt.plot(ts_stationary_train)
```
%% Cell type:code id:70235c46 tags:
``` python
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(ts_stationary_train)
```
%% Cell type:code id:edce5bba tags:
``` python
#génère une erreur
plot_pacf(ts_stationary_train)
```
%% Cell type:code id:e111eb2c tags:
``` python
#génère une erreur
model = sarimax(ts_stationary_train, order=(1, 0, 0))
model_fit = model.fit()
```
%% Cell type:code id:ad11fc71 tags:
``` python
```
%% Cell type:code id:b0fba332 tags:
``` python
```
%% Cell type:code id:b9b66fb6 tags:
``` python
```
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment